Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* Re: Linux 5.3-rc8
       [not found] <CAHk-=whBQ+6c-h+htiv6pp8ndtv97+45AH9WvdZougDRM6M4VQ@mail.gmail.com>
@ 2019-09-10  4:21 ` Ahmed S. Darwish
  2019-09-10 11:33   ` Linus Torvalds
                     ` (3 more replies)
  0 siblings, 4 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-10  4:21 UTC (permalink / raw)
  To: Theodore Ts'o, Andreas Dilger, Linus Torvalds
  Cc: Jan Kara, zhangjs, linux-ext4, linux-kernel

Hi,

On Sun, Sep 08, 2019 at 01:59:27PM -0700, Linus Torvalds wrote:
> So we probably didn't strictly need an rc8 this release, but with LPC
> and the KS conference travel this upcoming week it just makes
> everything easier.
>

The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
which was merged in v5.3-rc1, *always* leads to a blocked boot on my
system due to low entropy.

The hardware is not a VM: it's a Thinkpad E480 (i5-8250U CPU), with
a standard Arch user-space.

It was discovered through bisecting the problem v5.2 => v5.3-rc1,
since v5.2 never had any similar issues. The issue still persists in
v5.3-rc8: reverting that commit always fixes the problem.

It seems that batching the directory lookup I/O requests (which are
possibly a lot during boot) is minimizing sources of disk-activity-
induced entropy? [2] [3]

Can this even be considered a user-space breakage? I'm honestly not
sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
early-on fixes the problem. I'm not sure about the status of older
CPUs though.

Thanks,

[1]
  commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7
  Author: zhangjs <zachary@baishancloud.com>
  Date:   Wed Jun 19 23:41:29 2019 -0400

      ext4: make __ext4_get_inode_loc plug

      Add a blk_plug to prevent the inode table readahead from being
      submitted as small I/O requests.

      Signed-off-by: zhangjs <zachary@baishancloud.com>
      Signed-off-by: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: Jan Kara <jack@suse.cz>

[2] https://lkml.kernel.org/r/20190619122457.GF27954@quack2.suse.cz

[3] block/blk-core.c :: blk_start_plug()

--
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10  4:21 ` Linux 5.3-rc8 Ahmed S. Darwish
@ 2019-09-10 11:33   ` Linus Torvalds
  2019-09-10 12:21     ` Linus Torvalds
  2019-09-10 17:33     ` Ahmed S. Darwish
  2019-09-10 11:56   ` Theodore Y. Ts'o
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-10 11:33 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, zhangjs, linux-ext4,
	Linux List Kernel Mailing

On Tue, Sep 10, 2019 at 5:21 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> system due to low entropy.

Exactly what is it that blocks on entropy? Nobody should do that
during boot, because on some systems entropy is really really low
(think flash memory with polling IO etc).

That said, I would have expected that any PC gets plenty of entropy.
Are you sure it's entropy that is blocking, and not perhaps some odd
"forgot to unplug" situation?

> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

It's definitely breakage, although rather odd. I would have expected
us to have other sources of entropy than just the disk. Did we stop
doing low bits of TSC from timer interrupts etc?

Ted, either way - ext4 IO patterns or random number entropy - this is
your code. Comments?

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10  4:21 ` Linux 5.3-rc8 Ahmed S. Darwish
  2019-09-10 11:33   ` Linus Torvalds
@ 2019-09-10 11:56   ` Theodore Y. Ts'o
  2019-09-16 10:33     ` Christoph Hellwig
  2019-10-03 21:10   ` Jon Masters
  2019-10-03 21:31   ` Jon Masters
  3 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-10 11:56 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Andreas Dilger, Linus Torvalds, Jan Kara, zhangjs, linux-ext4,
	linux-kernel

On Tue, Sep 10, 2019 at 06:21:07AM +0200, Ahmed S. Darwish wrote:
> 
> The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> system due to low entropy.
> 
> The hardware is not a VM: it's a Thinkpad E480 (i5-8250U CPU), with
> a standard Arch user-space.

Hmm, I'm not seeing this on a Dell XPS 13 (model 9380) using a Debian
Bullseye (Testing) running a rc4+ kernel.

This could be because Debian is simply doing more I/O; or it could be
because I don't have some package installed which is trying to reading
from /dev/random or calling getrandom(2).  Previously, Fedora ran into
blocking issues because of some FIPS compliance patches to some
userspace daemons.  So it's going to be very user space dependent and
package dependent.

> It seems that batching the directory lookup I/O requests (which are
> possibly a lot during boot) is minimizing sources of disk-activity-
> induced entropy? [2] [3]
> 
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

You can probably also fix this problem by adding random.trust_cpu=true
to the boot command line, or by enabling CONFIG_RANDOM_TRUST_CPU.
This obviously assumes that you trust Intel's implementation of
RDRAND, but that's true regardless of whether of whether you use rngd
or the kernel config option.

As far as whether it's considered user-space breakage; that's though.
File system performance improvements can cause a reduced amount of
I/O, and that can cause less entropy to be collected, and depending on
a complex combination of kernel config options, distribution-specific
patches, and what packages are loaded, that could potentially cause
boot hangs waiting for entropy.  Does that we we're can't make any
file system performace improvements?  Surely that doesn't seem like
the right answer.

It would be useful to figure out what process is blocking waiting on
entropy, since in general, trying to rely on cryptographic entropy in
early boot, especially if it is to generate cryptographic keys, is
going to be more dangerous compared to a "just in time" approach to
generating crypto keys.  So this could also be considered a userspace
bug, depending on your point of view...

					- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 11:33   ` Linus Torvalds
@ 2019-09-10 12:21     ` Linus Torvalds
  2019-09-10 17:33     ` Ahmed S. Darwish
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-10 12:21 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, zhangjs, linux-ext4,
	Linux List Kernel Mailing

On Tue, Sep 10, 2019 at 12:33 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Are you sure it's entropy that is blocking, and not perhaps some odd
> "forgot to unplug" situation?

Looking at that code, it's all trivial, and it definitely unplugs properly.

Lack of entropy still sounds _very_ strange, and you . Are you doing
something odd at boot?

Does the boot continue if you press keys on the keyboard, or how did
you decide it was about entropy?

I guess sysrq-'t' followed by enough keyboard input to unblock the
boot process should give you something in dmesg that shows what is
blocked?

                Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 11:33   ` Linus Torvalds
  2019-09-10 12:21     ` Linus Torvalds
@ 2019-09-10 17:33     ` Ahmed S. Darwish
  2019-09-10 17:47       ` Reindl Harald
  2019-09-10 18:21       ` Linus Torvalds
  1 sibling, 2 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-10 17:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Tue, Sep 10, 2019 at 12:33:12PM +0100, Linus Torvalds wrote:
> On Tue, Sep 10, 2019 at 5:21 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> >
> > The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> > which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> > system due to low entropy.
>
> Exactly what is it that blocks on entropy? Nobody should do that
> during boot, because on some systems entropy is really really low
> (think flash memory with polling IO etc).
>

Ok, I've tracked it down further. It's unfortunately GDM
intentionally blocking on a getrandom(buf, 16, 0).

Booting the system with an straced GDM service
("ExecStart=strace -f /usr/bin/gdm") reveals:

  ...
  [  3.779375] strace[262]: [pid   323] execve("/usr/lib/gnome-session-binary",
                                                 ... /* 28 vars */) = 0
  ...
  [  4.019227] strace[262]: [pid   323] getrandom( <unfinished ...>
  [ 79.601433] kernel: random: crng init done
  [ 79.601443] kernel: random: 3 urandom warning(s) missed due to ratelimiting
  [ 79.601262] strace[262]: [pid   323] <... getrandom resumed>..., 16, 0) = 16
  [ 79.601262] strace[262]: [pid   323] getrandom(..., 16, 0) = 16
  [ 79.603041] strace[262]: [pid   323] getrandom(..., 16, 0) = 16
  [ 79.603041] strace[262]: [pid   323] getrandom(..., 16, 0) = 16
  [ 79.603041] strace[262]: [pid   323] getrandom(..., 16, 0) = 16

As can be seen in the timestamps, the GDM boot was only continued
by typing randomly on the keyboard..

> That said, I would have expected that any PC gets plenty of entropy.
> Are you sure it's entropy that is blocking, and not perhaps some odd
> "forgot to unplug" situation?
>

Yes, doing any of below steps makes the problem reliably disappear:

  - boot param "random.trust_cpu=on"
  - rngd(8) enabled at boot (entropy source: x86 RDRAND + jitter)
  - pressing random 3 or 4 keyboard keys while GDM boot is stuck

> > Can this even be considered a user-space breakage? I'm honestly not
> > sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> > early-on fixes the problem. I'm not sure about the status of older
> > CPUs though.
>
> It's definitely breakage, although rather odd. I would have expected
> us to have other sources of entropy than just the disk. Did we stop
> doing low bits of TSC from timer interrupts etc?
>

Exactly.

While gnome-session is obviously at fault here by requiring
*blocking* randomness at the boot path, it's still not requesting
much, just (5 * 16) bytes to be exact.

I guess an x86 laptop should be able to provide that, even without
RDRAND / random.trust_cpu=on (TSC jitter, etc.) ?

thanks,
--darwi

> Ted, either way - ext4 IO patterns or random number entropy - this is
> your code. Comments?
>
>                  Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 17:33     ` Ahmed S. Darwish
@ 2019-09-10 17:47       ` Reindl Harald
  2019-09-10 18:21       ` Linus Torvalds
  1 sibling, 0 replies; 211+ messages in thread
From: Reindl Harald @ 2019-09-10 17:47 UTC (permalink / raw)
  To: Ahmed S. Darwish, Linus Torvalds
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml



Am 10.09.19 um 19:33 schrieb Ahmed S. Darwish:
> Yes, doing any of below steps makes the problem reliably disappear:
> 
>   - boot param "random.trust_cpu=on"
>   - rngd(8) enabled at boot (entropy source: x86 RDRAND + jitter)
>   - pressing random 3 or 4 keyboard keys while GDM boot is stuck

and on machines without or broken RDRAND (AMD) and nobody near the
keyboard to play some song on it?

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 17:33     ` Ahmed S. Darwish
  2019-09-10 17:47       ` Reindl Harald
@ 2019-09-10 18:21       ` Linus Torvalds
  2019-09-11 16:07         ` Theodore Y. Ts'o
  1 sibling, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-10 18:21 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Tue, Sep 10, 2019 at 6:33 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> While gnome-session is obviously at fault here by requiring
> *blocking* randomness at the boot path, it's still not requesting
> much, just (5 * 16) bytes to be exact.
>
> I guess an x86 laptop should be able to provide that, even without
> RDRAND / random.trust_cpu=on (TSC jitter, etc.) ?

Yeah, the problem is partly because we can't trust "get_cycles()"
because not all architectures have it. So we use "jiffies" for the
entropy estimation, and my guess is that it just ends up estimating
you have little to no entropy from your disk IO.

So the timestamp counter value is added to the randomness pool, but
the jitter in the TSC values isn't then used to estimate the entropy
at all.

Just out of curiosity, what happens if you apply a patch like this
(intentionally whitespace-damaged, I don't want anybody to pick it up
without thinking about it) thing:

   diff --git a/drivers/char/random.c b/drivers/char/random.c
   index 5d5ea4ce1442..60709a7b4af1 100644
   --- a/drivers/char/random.c
   +++ b/drivers/char/random.c
   @@ -1223,6 +1223,7 @@ static void add_timer_randomness(struct
timer_rand_state *state, unsigned $
         * We take into account the first, second and third-order deltas
         * in order to make our estimate.
         */
   +    sample.jiffies += sample.cycles;
        delta = sample.jiffies - state->last_time;
        state->last_time = sample.jiffies;


which just makes the entropy estimation use the _sum_ of jiffies and
cycles as the base. On architectures that don't have a cycle counter,
it ends up being the same it used to be (just jiffies), and on
architectures that do have a timestamp counter the TSC differences
will overwhelm the jiffies differences, so you end up effectively
using the third-order TSC difference as the entropy estimation.

Which I think is what the code really wants - it's only using jiffies
because that is the only thing _guaranteed_ to change at all. But with
the sum, you get the best of both worlds, and should basically make
the entropy estimation use the "better of two counters".

Ted, comments? I'd hate to revert the ext4 thing just because it
happens to expose a bad thing in user space.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 18:21       ` Linus Torvalds
@ 2019-09-11 16:07         ` Theodore Y. Ts'o
  2019-09-11 16:45           ` Linus Torvalds
  2019-09-16  3:52           ` Herbert Xu
  0 siblings, 2 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-11 16:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Tue, Sep 10, 2019 at 07:21:54PM +0100, Linus Torvalds wrote:
> On Tue, Sep 10, 2019 at 6:33 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> >
> > While gnome-session is obviously at fault here by requiring
> > *blocking* randomness at the boot path, it's still not requesting
> > much, just (5 * 16) bytes to be exact.

It doesn't matter how much randomness it's requesting.  With the new
cryptographic random number generator, the CRNG is either
initialized.... or it's not.

> Just out of curiosity, what happens if you apply a patch like this
> (intentionally whitespace-damaged, I don't want anybody to pick it up
> without thinking about it) thing...

> Which I think is what the code really wants - it's only using jiffies
> because that is the only thing _guaranteed_ to change at all. But with
> the sum, you get the best of both worlds, and should basically make
> the entropy estimation use the "better of two counters".
> 
> Ted, comments? I'd hate to revert the ext4 thing just because it
> happens to expose a bad thing in user space.

Unfortuantely, I very much doubt this is going to work.  That's
because the add_disk_randomness() path is only used for legacy
/dev/random (which actually only still exists because of some insane
PCI compliance issues which a number of end users really care about
--- or they care about because it makes the insane PCI complaince labs
go away).

Also, because by default, the vast majority of disks have
/sys/block/XXX/queue/add_random set to zero by default.

So the the way we get entropy these days for initializing the CRNG is
via the add_interrupt_randomness() path, where do something really
fast, and we assume that we get enough uncertainity from 8 interrupts
to give us one bit of entropy (64 interrupts to give us a byte of
entropy), and that we need 512 bits of entropy to consider the CRNG
fully initialized.  (Yeah, there's a lot of conservatism in those
estimates, and so what we could do is decide to say, cut down the
number of bits needed to initialize the CRNG to be 256 bits, since
that's the size of the CHACHA20 cipher.)

Ultimately, though, we need to find *some* way to fix userspace's
assumptions that they can always get high quality entropy in early
boot, or we need to get over people's distrust of Intel and RDRAND.
Otherwise, future performance improvements in any part of the system
which reduces the number of interrupts is always going to potentially
result in somebody's misconfigured system or badly written
applications to fail to boot.  :-(

					- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 16:07         ` Theodore Y. Ts'o
@ 2019-09-11 16:45           ` Linus Torvalds
  2019-09-11 17:00             ` Linus Torvalds
  2019-09-11 21:41             ` Ahmed S. Darwish
  2019-09-16  3:52           ` Herbert Xu
  1 sibling, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-11 16:45 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 11, 2019 at 5:07 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > Ted, comments? I'd hate to revert the ext4 thing just because it
> > happens to expose a bad thing in user space.
>
> Unfortuantely, I very much doubt this is going to work.  That's
> because the add_disk_randomness() path is only used for legacy
> /dev/random [...]
>
> Also, because by default, the vast majority of disks have
> /sys/block/XXX/queue/add_random set to zero by default.

Gaah. I was looking at the input randomness, since I thought that was
where the added randomness that Ahmed got things to work with came
from.

And that then made me just look at the legacy disk randomness (for the
obvious disk IO reasons) and I didn't look further.

> So the the way we get entropy these days for initializing the CRNG is
> via the add_interrupt_randomness() path, where do something really
> fast, and we assume that we get enough uncertainity from 8 interrupts
> to give us one bit of entropy (64 interrupts to give us a byte of
> entropy), and that we need 512 bits of entropy to consider the CRNG
> fully initialized.  (Yeah, there's a lot of conservatism in those
> estimates, and so what we could do is decide to say, cut down the
> number of bits needed to initialize the CRNG to be 256 bits, since
> that's the size of the CHACHA20 cipher.)

So that's 4k interrupts if I counted right, and yeah, maybe Ahmed was
just close enough before, and the merging of the inode table IO then
took him below that limit.

> Ultimately, though, we need to find *some* way to fix userspace's
> assumptions that they can always get high quality entropy in early
> boot, or we need to get over people's distrust of Intel and RDRAND.

Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
it up a few times, and older Intel chips just don't have it.

So I'd be inclined to either lower the limit regardless - and perhaps
make the "user space asked for randomness much too early" be a big
*warning* instead of being a basically fatal hung machine?

                Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 16:45           ` Linus Torvalds
@ 2019-09-11 17:00             ` Linus Torvalds
  2019-09-11 17:36               ` Theodore Y. Ts'o
  2019-09-11 21:41             ` Ahmed S. Darwish
  1 sibling, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-11 17:00 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 11, 2019 at 5:45 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'd be inclined to either lower the limit regardless - and perhaps
> make the "user space asked for randomness much too early" be a big
> *warning* instead of being a basically fatal hung machine?

Hmm. Just testing - normally I run my laptop with TRUST_CPU enabled,
so I never see this any more, but warning (rather than waiting) is
what we still do for the kernel.

And I see

    [    0.231255] random: get_random_bytes called from
start_kernel+0x323/0x4f5 with crng_init=0

and that's this code:

        add_latent_entropy();
        add_device_randomness(command_line, strlen(command_line));
        boot_init_stack_canary();

in particular, it's the boot_init_stack_canary() thing that asks for a
random number for the canary.

I don't actually see the 'crng init done' until much much later:

    [   21.741125] random: crng init done

but part of that may be that my early boot is slow due to having an
encrypted disk and so the bootup ends up waiting for me to type the
passphrase.

But this does show that

 (a) we have the same issue in the kernel, and we don't block there

 (b) initializing the crng really can be a timing problem

The interrupt thing is only going to get worse as disks turn into
ssd's and some of them end up using polling rather than interrupts..
So we're likely to see _fewer_ interrupts in the future, not more.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 17:00             ` Linus Torvalds
@ 2019-09-11 17:36               ` Theodore Y. Ts'o
  2019-09-12  3:44                 ` Ahmed S. Darwish
  0 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-11 17:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote:
>     [    0.231255] random: get_random_bytes called from
> start_kernel+0x323/0x4f5 with crng_init=0
> 
> and that's this code:
> 
>         add_latent_entropy();
>         add_device_randomness(command_line, strlen(command_line));
>         boot_init_stack_canary();
> 
> in particular, it's the boot_init_stack_canary() thing that asks for a
> random number for the canary.
> 
> I don't actually see the 'crng init done' until much much later:
> 
>     [   21.741125] random: crng init done

Yes, that's super early in the boot sequence.  IIRC the stack canary
gets reinitialized later (or maybe it was only for the other CPU's in
SMP mode; I don't recall the details of the top of my head).

I think this one always fails, and perhaps we should have a way of
suppressing it --- but that's correct the in-kernel interface doesn't
block.

The /dev/urandom device doesn't block either, despite security
eggheads continually asking me to change it to block ala getrandom(2),
but I have always pushed because because I *know* changing
/dev/urandom to block would be asking for userspace regressions.

The compromise we came up with was that since getrandom(2) is a new
interface, we could make this have the behavior that the security
heads wanted, which is to make blocking unconditional, since the
theory was that *this* interface would be sane, and that userspace
applications which used it too early was buggy, and we could make it
*their* problem.

People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
or some such, which wouldn't block and would return "best efforts"
randomness.  I haven't been super enthusiastic about such a flag
because I *know* it would be insecure.   However, the next time a massive
security bug shows up on the front pages of the Wall Street Journal,
or on some web site such as https://factorable.net, it won't be the kernel's fault
since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
It doesn't really solve the problem, though.

> But this does show that
> 
>  (a) we have the same issue in the kernel, and we don't block there

Ultimately, I think the only right answer is to make it the
bootloader's responsibility to get us some decent entropy at boot
time.  There are patches to allow ARM systems to pass in entropy via
the device tree.  And in theory (assuming you trust the UEFI BIOS ---
stop laughing in the back!) we can use that get entropy which will
solve the problem for UEFI boot systems.  I've been talking to Ron
Minnich about trying to get this support into the NERF bootloader, at
which point new servers from the Open Compute Project will have a
solution as well.  (We can probably also get solutions for Chrome OS
devices, since those have TPM-like which are trusted to have a
comptently engineered hardware RNG --- I'm not sure I would trust all
TPM devices in commodity hardware, but again, at least we can shift
blame off of the kernel.  :-P)

Still, these are all point solutions, and don't really solve the
problem on older systems, or non-x86 systems.

>  (b) initializing the crng really can be a timing problem
> 
> The interrupt thing is only going to get worse as disks turn into
> ssd's and some of them end up using polling rather than interrupts..
> So we're likely to see _fewer_ interrupts in the future, not more.

Yeah, agreed.  Maybe we should have an "insecure_randomness" boot
option which blindly forces the CRNG to be initialized at boot, so
that at least people can get to a command line, if insecurely?  I
don't have any good ideas about how to solve this problem in general.
:-( :-( :-(

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 16:45           ` Linus Torvalds
  2019-09-11 17:00             ` Linus Torvalds
@ 2019-09-11 21:41             ` Ahmed S. Darwish
  2019-09-11 22:37               ` Ahmed S. Darwish
  1 sibling, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-11 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 11, 2019 at 05:45:38PM +0100, Linus Torvalds wrote:
> On Wed, Sep 11, 2019 at 5:07 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> > >
> > > Ted, comments? I'd hate to revert the ext4 thing just because it
> > > happens to expose a bad thing in user space.
> >
> > Unfortuantely, I very much doubt this is going to work.  That's
> > because the add_disk_randomness() path is only used for legacy
> > /dev/random [...]
> >
> > Also, because by default, the vast majority of disks have
> > /sys/block/XXX/queue/add_random set to zero by default.
> 
> Gaah. I was looking at the input randomness, since I thought that was
> where the added randomness that Ahmed got things to work with came
> from.
> 
> And that then made me just look at the legacy disk randomness (for the
> obvious disk IO reasons) and I didn't look further.
>

Yup, I confirm that the quick patch kept the situation as-is. I was
going to debug why, but now we know the answer..

> > So the the way we get entropy these days for initializing the CRNG is
> > via the add_interrupt_randomness() path, where do something really
> > fast, and we assume that we get enough uncertainity from 8 interrupts
> > to give us one bit of entropy (64 interrupts to give us a byte of
> > entropy), and that we need 512 bits of entropy to consider the CRNG
> > fully initialized.  (Yeah, there's a lot of conservatism in those
> > estimates, and so what we could do is decide to say, cut down the
> > number of bits needed to initialize the CRNG to be 256 bits, since
> > that's the size of the CHACHA20 cipher.)
> 
> So that's 4k interrupts if I counted right, and yeah, maybe Ahmed was
> just close enough before, and the merging of the inode table IO then
> took him below that limit.
>
> > Ultimately, though, we need to find *some* way to fix userspace's
> > assumptions that they can always get high quality entropy in early
> > boot, or we need to get over people's distrust of Intel and RDRAND.
>
> Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
> it up a few times, and older Intel chips just don't have it.
> 
> So I'd be inclined to either lower the limit regardless -

ACK :)

> and perhaps make the "user space asked for randomness much too
> early" be a big *warning* instead of being a basically fatal hung
> machine?

Hmmm, regarding "randomness request much too early", how much is time
really a factor here?

I tested leaving the machine even for 15+ minutes, and it still didn't
continue booting: the boot is practically blocked forever...

Or is the thoery that hopefully once the machine is un-stuck, more
sources of entropy will be available? If that's the case, then
possibly (rate-limited):

  "urandom: process XX asked for YY bytes. CRNG not yet initialized"

> Linus

thanks,

--
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 21:41             ` Ahmed S. Darwish
@ 2019-09-11 22:37               ` Ahmed S. Darwish
  0 siblings, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-11 22:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 11, 2019 at 11:41:44PM +0200, Ahmed S. Darwish wrote:
> On Wed, Sep 11, 2019 at 05:45:38PM +0100, Linus Torvalds wrote:
[...]
> >
> > Well, even on a PC, sometimes rdrand just isn't there. AMD has screwed
> > it up a few times, and older Intel chips just don't have it.
> > 
> > So I'd be inclined to either lower the limit regardless -
> 
> ACK :)
> 
> > and perhaps make the "user space asked for randomness much too
> > early" be a big *warning* instead of being a basically fatal hung
> > machine?
> 
> Hmmm, regarding "randomness request much too early", how much is time
> really a factor here?
> 
> I tested leaving the machine even for 15+ minutes, and it still didn't
> continue booting: the boot is practically blocked forever...
> 
> Or is the thoery that hopefully once the machine is un-stuck, more
> sources of entropy will be available? If that's the case, then
> possibly (rate-limited):
> 
>   "urandom: process XX asked for YY bytes. CRNG not yet initialized"
>
     ^
     getrandom: ....

(since urandom always succeeds, even if CRNG is not inited, and
 it already prints a very similar warning in that case anyway..)

thanks,
--darwi

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 17:36               ` Theodore Y. Ts'o
@ 2019-09-12  3:44                 ` Ahmed S. Darwish
  2019-09-12  8:25                   ` Theodore Y. Ts'o
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-12  3:44 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

Hi Ted,

On Wed, Sep 11, 2019 at 01:36:24PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote:
> >     [    0.231255] random: get_random_bytes called from
> > start_kernel+0x323/0x4f5 with crng_init=0
> >
> > and that's this code:
> >
> >         add_latent_entropy();
> >         add_device_randomness(command_line, strlen(command_line));
> >         boot_init_stack_canary();
> >
> > in particular, it's the boot_init_stack_canary() thing that asks for a
> > random number for the canary.
> >
> > I don't actually see the 'crng init done' until much much later:
> >
> >     [   21.741125] random: crng init done
>
> Yes, that's super early in the boot sequence.  IIRC the stack canary
> gets reinitialized later (or maybe it was only for the other CPU's in
> SMP mode; I don't recall the details of the top of my head).
>
> I think this one always fails, and perhaps we should have a way of
> suppressing it --- but that's correct the in-kernel interface doesn't
> block.
>
> The /dev/urandom device doesn't block either, despite security
> eggheads continually asking me to change it to block ala getrandom(2),
> but I have always pushed because because I *know* changing
> /dev/urandom to block would be asking for userspace regressions.
>
> The compromise we came up with was that since getrandom(2) is a new
> interface, we could make this have the behavior that the security
> heads wanted, which is to make blocking unconditional, since the
> theory was that *this* interface would be sane, and that userspace
> applications which used it too early was buggy, and we could make it
> *their* problem.
>

Hmmmm, IMHO it's almost impossible to define "too early" here... Does
it mean applications in the critical boot path? Does gnome-session =>
libICE => libbsd => getentropy() => getrandom() => generated MIT magic
cookie count as being too early? It's very hazy...

getrandom(2) basically has no guaranteed upper bound for the waiting
time. And in the report I submitted in the parent thread, the upper
bound is really "infinitely locked"...

Here is a trace_printk() log of all the getrandom() calls done from
system boot:

    systemd-random--179   2.510228: getrandom(512 bytes, flags = 1)
    systemd-random--179   2.510239: getrandom(512 bytes, flags = 0)
            polkitd-294   3.903699: getrandom(8 bytes, flags = 1)
            polkitd-294   3.904191: getrandom(8 bytes, flags = 1)

                          ... + 45 similar instances

    gnome-session-b-327   4.400620: getrandom(16 bytes, flags = 0)

                          ... boot blocks here, until
                              pressing some keys

    gnome-session-b-327   49.32140: getrandom(16 bytes, flags = 0)

                          ... + 3 similar instances

        gnome-shell-335   49.553594: getrandom(8 bytes, flags = 1)
        gnome-shell-335   49.553600: getrandom(8 bytes, flags = 1)

                          ... + 10 similar instances

           Xwayland-345   50.129401: getrandom(8 bytes, flags = 1)
           Xwayland-345   50.129491: getrandom(8 bytes, flags = 1)

                          ... + 9 similar instances

        gnome-shell-335   50.487543: getrandom(8 bytes, flags = 1)
        gnome-shell-335   50.487550: getrandom(8 bytes, flags = 1)

                          ... + 79 similar instances

      gsd-xsettings-390   51.431638: getrandom(8 bytes, flags = 1)
      gsd-clipboard-389   51.432693: getrandom(8 bytes, flags = 1)
      gsd-xsettings-390   51.433899: getrandom(8 bytes, flags = 1)
      gsd-smartcard-388   51.433924: getrandom(110 bytes, flags = 0)
      gsd-smartcard-388   51.433936: getrandom(256 bytes, flags = 0)

                          ... + 3 similar instances

And it goes on, including processes like gsd-power-, gsd-xsettings-,
gsd-clipboard-, gsd-print-notif, gsd-clipboard-, gsd-color,
gst-keyboard-, etc.

What's the boundary of "too early" here? It's kinda undefinable..

> People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
> or some such, which wouldn't block and would return "best efforts"
> randomness.  I haven't been super enthusiastic about such a flag
> because I *know* it would be insecure.   However, the next time a massive
> security bug shows up on the front pages of the Wall Street Journal,
> or on some web site such as https://factorable.net, it won't be the kernel's fault
> since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
> It doesn't really solve the problem, though.
>

At least for generating the MIT cookie, it would make some sort of
sense... Really caring about truly random-numbers while using Xorg
is almost like perfecting a hard-metal door for the paper house ;)

(Jokes aside, I understand that this cannot be the solution)

> > But this does show that
> >
> >  (a) we have the same issue in the kernel, and we don't block there
>
> Ultimately, I think the only right answer is to make it the
> bootloader's responsibility to get us some decent entropy at boot
> time.

Just 8 days ago, systemd v243 was released, with systemd-random-seed(8)
now supporting *crediting* the entropy while loading the random seed:

    https://systemd.io/RANDOM_SEEDS

systemd-random-seed do something similar to what OpenBSD does, by
preserving the seed across reboots at /var/lib/systemd/random-seed.

This is not enabled by default though. Will distributions enable it by
default in the future? I have no idea \_(.)_/

> There are patches to allow ARM systems to pass in entropy via
> the device tree.  And in theory (assuming you trust the UEFI BIOS ---
> stop laughing in the back!) we can use that get entropy which will
> solve the problem for UEFI boot systems.

Hmmmm ...

> I've been talking to Ron
> Minnich about trying to get this support into the NERF bootloader, at
> which point new servers from the Open Compute Project will have a
> solution as well.  (We can probably also get solutions for Chrome OS
> devices, since those have TPM-like which are trusted to have a
> comptently engineered hardware RNG --- I'm not sure I would trust all
> TPM devices in commodity hardware, but again, at least we can shift
> blame off of the kernel.  :-P)
>
> Still, these are all point solutions, and don't really solve the
> problem on older systems, or non-x86 systems.
>

For non-x86 _embedded_ systems at least, usually the BSP provider
enables the necessary hwrng driver in question and credit its entropy;
e.g. 62f95ae805fa (hwrng: omap - Set default quality).

> >  (b) initializing the crng really can be a timing problem
> >
> > The interrupt thing is only going to get worse as disks turn into
> > ssd's and some of them end up using polling rather than interrupts..
> > So we're likely to see _fewer_ interrupts in the future, not more.
>
> Yeah, agreed.  Maybe we should have an "insecure_randomness" boot
> option which blindly forces the CRNG to be initialized at boot, so
> that at least people can get to a command line, if insecurely?  I
> don't have any good ideas about how to solve this problem in general.
> :-( :-( :-(
>
> 						- Ted

Yeah, this is a hard engineering problem. You've earlier summarized it
perfectly here:

    https://lore.kernel.org/r/20180514003034.GI14763@thunk.org

I guess, to summarize earlier e-mails, a nice path would be:

    1. Cutting down the number of bits needed to initialize the CRNG
       to 256 bits (CHACHA20 cipher)

    2. Complaining loudly when getrandom() is used while the CRNG is
       not yet initialized.

    3. Hopefully #2 will force distributions to act: either trusting
       RDRANDOM when it's sane, configuring systmed-random-seed(8) to
       credit the entropy by default, etc.

Thanks!

--
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-12  3:44                 ` Ahmed S. Darwish
@ 2019-09-12  8:25                   ` Theodore Y. Ts'o
  2019-09-12 11:34                     ` Linus Torvalds
  2019-09-14  9:25                     ` Ahmed S. Darwish
  0 siblings, 2 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-12  8:25 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Thu, Sep 12, 2019 at 05:44:21AM +0200, Ahmed S. Darwish wrote:
> > People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
> > or some such, which wouldn't block and would return "best efforts"
> > randomness.  I haven't been super enthusiastic about such a flag
> > because I *know* it would be abused.   However, the next time a massive
> > security bug shows up on the front pages of the Wall Street Journal,
> > or on some web site such as https://factorable.net, it won't be the kernel's fault
> > since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
> > It doesn't really solve the problem, though.

Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
minutes before returning "best efforts" randomness and issuing a huge
massive warning if it is triggered?

> At least for generating the MIT cookie, it would make some sort of
> sense... Really caring about truly random-numbers while using Xorg
> is almost like perfecting a hard-metal door for the paper house ;)

For the MIT Magic Cookie, it might as well use GRND_NONBLOCK, and if
it fails due to randomness being not available, it should just fall
back to random_r(3).  Or heck, just use random_r(3) all the time,
since it's not at all secure anyway....

> Just 8 days ago, systemd v243 was released, with systemd-random-seed(8)
> now supporting *crediting* the entropy while loading the random seed:
> 
>     https://systemd.io/RANDOM_SEEDS
> 
> systemd-random-seed do something similar to what OpenBSD does, by
> preserving the seed across reboots at /var/lib/systemd/random-seed.

That makes it systemd's responsibility to properly manage the random
seed file, and if the random seed file gets imaged, or if it gets read
while the system is off, that's on systemd....   which is fine.

The real problem here is that we're trying to engineer a system which
makes it safe for real cryptographic systems, but there's no way to
distinguish between real cryptographic systems where proper entropy is
critical and pretend security systems like X.org's MIT Magic Cookie
--- or python trying to get random numbers seeding its dictionary hash
tables to avoid DOS attacks when python is used for CGI scripts ---
but guess what happens when python is used for systemd generator
scripts in early boot.... before the random seed file might even be
mounted?  In that case, python reverted to using /dev/urandom, which
was probably the right choice --- it didn't *need* to use getrandom.

>     1. Cutting down the number of bits needed to initialize the CRNG
>        to 256 bits (CHACHA20 cipher)

Does the attach patch (see below) help?

>     2. Complaining loudly when getrandom() is used while the CRNG is
>        not yet initialized.

A kernel printk will make it easier for people to understand why their
system is hung, in any case --- and which process is to blame.  So
that's definitely a good thing.

						- Ted

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..b9b3a5a82abf 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -500,7 +500,7 @@ static int crng_init = 0;
 #define crng_ready() (likely(crng_init > 1))
 static int crng_init_cnt = 0;
 static unsigned long crng_global_init_time = 0;
-#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
+#define CRNG_INIT_CNT_THRESH	CHACHA_KEY_SIZE
 static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
 static void _crng_backtrack_protect(struct crng_state *crng,
 				    __u8 tmp[CHACHA_BLOCK_SIZE], int used);

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-12  8:25                   ` Theodore Y. Ts'o
@ 2019-09-12 11:34                     ` Linus Torvalds
  2019-09-12 11:58                       ` Willy Tarreau
                                         ` (2 more replies)
  2019-09-14  9:25                     ` Ahmed S. Darwish
  1 sibling, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-12 11:34 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Thu, Sep 12, 2019 at 9:25 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
> minutes before returning "best efforts" randomness and issuing a huge
> massive warning if it is triggered?

Yeah, based on (by now) _years_ of experience with people mis-using
"get me random numbers", I think the sense of a new flag needs to be
"yeah, I'm willing to wait for it".

Because most people just don't want to wait for it, and most people
don't think about it, and we need to make the default be for that
"don't think about it" crowd, with the people who ask for randomness
sources for a secure key having to very clearly and very explicitly
say "Yes, I understand that this can take minutes and can only be done
long after boot".

Even then people will screw that up because they copy code, or some
less than gifted rodent writes a library and decides "my library is so
important that I need that waiting sooper-sekrit-secure random
number", and then people use that broken library by mistake without
realizing that it's not going to be reliable at boot time.

An alternative might be to make getrandom() just return an error
instead of waiting. Sure, fill the buffer with "as random as we can"
stuff, but then return -EINVAL because you called us too early.

                  Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-12 11:34                     ` Linus Torvalds
@ 2019-09-12 11:58                       ` Willy Tarreau
  2019-09-14 12:25                       ` [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool Ahmed S. Darwish
  2019-09-14 15:02                       ` Linux 5.3-rc8 Ahmed S. Darwish
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-12 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Ahmed S. Darwish, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> An alternative might be to make getrandom() just return an error
> instead of waiting. Sure, fill the buffer with "as random as we can"
> stuff, but then return -EINVAL because you called us too early.

That's probably one of the most sensible approaches. I must say I feel
quite annoyed by what randomness has become due to the misuse of poor
random sources by security components suddenly forcing all these sources
to become strong and having to become unavailable for everything which
doesn't need strong random. And most of the time the stuff which doesn't
need a strong random happens during early boot. It can range from issuing
a MAC address before setting a link up (when you have no chance to get
entropy) to providing a UUID for a file system, or use of ephemeral
randoms for session keys for the first access to a device for its
configuration. A number of these often end up with a system not
booting, unable to self-configure itself, or not being available when
expected.

It's too late now to change existing applications, but probably that
doing something like above would at least allow applications to
implement a fall back with the choice of "hey Mr user, there's not
enough entropy yet to propose you a secure password, so please type
20 random chars on the keyboard so that I can complete it", or
conversely "the syscall failed but I know I can still use the
buffer's contents for a MAC address".

But having to make the syscall to wait longer is never going to serve
anyone. Two minutes is an eternity for certain devices, and some from
the security world will consider that the syscall waited long enough
to produce a good security so it's OK to use it as a reliable source.
Failing immediately with whatever could be obtained is by far the
best solution in my opinion as the application has to take the
responsibility for using that buffer's contents.

Willy
-- still dreaming about the day boot loaders will collect entropy from
the DDR training phase and pass it to the kernel.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-12  8:25                   ` Theodore Y. Ts'o
  2019-09-12 11:34                     ` Linus Torvalds
@ 2019-09-14  9:25                     ` Ahmed S. Darwish
  2019-09-14 16:27                       ` Theodore Y. Ts'o
  1 sibling, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-14  9:25 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Thu, Sep 12, 2019 at 04:25:30AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 12, 2019 at 05:44:21AM +0200, Ahmed S. Darwish wrote:
[...]
> 
> >     1. Cutting down the number of bits needed to initialize the CRNG
> >        to 256 bits (CHACHA20 cipher)
> 
> Does the attach patch (see below) help?
>
[...]
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 5d5ea4ce1442..b9b3a5a82abf 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -500,7 +500,7 @@ static int crng_init = 0;
>  #define crng_ready() (likely(crng_init > 1))
>  static int crng_init_cnt = 0;
>  static unsigned long crng_global_init_time = 0;
> -#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
> +#define CRNG_INIT_CNT_THRESH	CHACHA_KEY_SIZE
>  static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
>  static void _crng_backtrack_protect(struct crng_state *crng,
>  				    __u8 tmp[CHACHA_BLOCK_SIZE], int used);

Unfortunately, it only made the early fast init faster, but didn't fix
the normal crng init blockage :-(

Here's a trace log, got by applying the patch at [1]. The boot was
continued only after typing some random keys after ~30s:

#
# entries-in-buffer/entries-written: 22/22   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [001] dNh.     0.687088: crng_fast_load: crng threshold = 32
          <idle>-0     [001] dNh.     0.687089: crng_fast_load: crng_init_cnt = 0
          <idle>-0     [001] dNh.     0.687090: crng_fast_load: crng_init_cnt, now set to 16
          <idle>-0     [001] dNh.     0.705208: crng_fast_load: crng threshold = 32
          <idle>-0     [001] dNh.     0.705209: crng_fast_load: crng_init_cnt = 16
          <idle>-0     [001] dNh.     0.705209: crng_fast_load: crng_init_cnt, now set to 32
          <idle>-0     [001] dNh.     0.708048: crng_fast_load: random: fast init done
             lvm-165   [001] d...     2.417971: urandom_read: random: crng_init_cnt, now set to 0
 systemd-random--179   [003] ....     2.495669: wait_for_random_bytes.part.0: wait for randomness
     dbus-daemon-274   [006] dN..     3.294331: urandom_read: random: crng_init_cnt, now set to 0
     dbus-daemon-274   [006] dN..     3.316618: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] dN..     3.873918: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] dN..     3.874303: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] dN..     3.874375: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] d...     3.886204: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] d...     3.886217: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] d...     3.888519: urandom_read: random: crng_init_cnt, now set to 0
         polkitd-286   [007] d...     3.888529: urandom_read: random: crng_init_cnt, now set to 0
 gnome-session-b-321   [006] ....     4.292034: wait_for_random_bytes.part.0: wait for randomness
          <idle>-0     [002] dNh.    36.784001: crng_reseed: random: crng init done
 gnome-session-b-321   [006] ....    36.784019: wait_for_random_bytes.part.0: wait done
 systemd-random--179   [003] ....    36.784051: wait_for_random_bytes.part.0: wait done

[1] patch:

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..4a50ee2c230d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -500,7 +500,7 @@ static int crng_init = 0;
 #define crng_ready() (likely(crng_init > 1))
 static int crng_init_cnt = 0;
 static unsigned long crng_global_init_time = 0;
-#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
+#define CRNG_INIT_CNT_THRESH (CHACHA_KEY_SIZE)
 static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
 static void _crng_backtrack_protect(struct crng_state *crng,
 				    __u8 tmp[CHACHA_BLOCK_SIZE], int used);
@@ -931,6 +931,9 @@ static int crng_fast_load(const char *cp, size_t len)
 	unsigned long flags;
 	char *p;
 
+	trace_printk("crng threshold = %d\n", CRNG_INIT_CNT_THRESH);
+	trace_printk("crng_init_cnt = %d\n", crng_init_cnt);
+
 	if (!spin_trylock_irqsave(&primary_crng.lock, flags))
 		return 0;
 	if (crng_init != 0) {
@@ -943,11 +946,15 @@ static int crng_fast_load(const char *cp, size_t len)
 		cp++; crng_init_cnt++; len--;
 	}
 	spin_unlock_irqrestore(&primary_crng.lock, flags);
+
+	trace_printk("crng_init_cnt, now set to %d\n", crng_init_cnt);
+
 	if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
 		invalidate_batched_entropy();
 		crng_init = 1;
 		wake_up_interruptible(&crng_init_wait);
 		pr_notice("random: fast init done\n");
+		trace_printk("random: fast init done\n");
 	}
 	return 1;
 }
@@ -1033,6 +1040,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 		process_random_ready_list();
 		wake_up_interruptible(&crng_init_wait);
 		pr_notice("random: crng init done\n");
+		trace_printk("random: crng init done\n");
 		if (unseeded_warning.missed) {
 			pr_notice("random: %d get_random_xx warning(s) missed "
 				  "due to ratelimiting\n",
@@ -1743,9 +1751,16 @@ EXPORT_SYMBOL(get_random_bytes);
  */
 int wait_for_random_bytes(void)
 {
+	int ret;
+
 	if (likely(crng_ready()))
 		return 0;
-	return wait_event_interruptible(crng_init_wait, crng_ready());
+
+	trace_printk("wait for randomness\n");
+	ret = wait_event_interruptible(crng_init_wait, crng_ready());
+	trace_printk("wait done\n");
+
+	return ret;
 }
 EXPORT_SYMBOL(wait_for_random_bytes);
 
@@ -1974,6 +1989,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 			       current->comm, nbytes);
 		spin_lock_irqsave(&primary_crng.lock, flags);
 		crng_init_cnt = 0;
+		trace_printk("random: crng_init_cnt, now set to %d\n",
+			     crng_init_cnt);
 		spin_unlock_irqrestore(&primary_crng.lock, flags);
 	}
 	nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));

thanks,

-- 
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool
  2019-09-12 11:34                     ` Linus Torvalds
  2019-09-12 11:58                       ` Willy Tarreau
@ 2019-09-14 12:25                       ` Ahmed S. Darwish
  2019-09-14 14:08                         ` Alexander E. Patrakov
  2019-09-14 15:02                       ` Linux 5.3-rc8 Ahmed S. Darwish
  2 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-14 12:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

getrandom() has been created as a new and more secure interface for
pseudorandom data requests.  Unlike /dev/urandom, it unconditionally
blocks until the entropy pool has been properly initialized.

While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been abusing it by issuing the syscall, from shared
libraries no less, during the main system boot sequence.

Thus, on certain setups where there is no hwrng (embedded), or the
hwrng is not trusted by some users (intel RDRAND), or sometimes it's
just broken (amd RDRAND), the system boot can be *reliably* blocked.

The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
merges directory lookup code inode table IO, and thus minimizes the
number of disk interrupts and entropy during boot. After that commit,
a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
with standard ArchLinux user-space.

Thus, don't trust user-space on calling getrandom() from the right
context. Just never block, and return -EINVAL if entropy is not yet
available.

Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---

Notes:
    This feels very risky at the very end of -rc8, so only sending
    this as an RFC. The system of course reliably boots with this,
    and the log, as expected, powerfully warns all callers:

    $ dmesg | grep random
    [0.236472] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
    [0.680263] random: fast init done
    [2.500346] random: lvm: uninitialized urandom read (4 bytes read)
    [2.595125] random: systemd-random-: invalid getrandom request (512 bytes): crng not ready
    [2.595126] random: systemd-random-: uninitialized urandom read (512 bytes read)
    [3.427699] random: dbus-daemon: uninitialized urandom read (12 bytes read)
    [3.979425] urandom_read: 1 callbacks suppressed
    [3.979426] random: polkitd: uninitialized urandom read (8 bytes read)
    [3.979726] random: polkitd: uninitialized urandom read (8 bytes read)
    [3.979752] random: polkitd: uninitialized urandom read (8 bytes read)
    [4.473398] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
    [4.473404] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
    [4.473409] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
    [5.265636] random: crng init done
    [5.265649] random: 3 urandom warning(s) missed due to ratelimiting
    [5.265652] random: 1 getrandom warning(s) missed due to ratelimiting

 drivers/char/random.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4a50ee2c230d..309dc5ddf370 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
 	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
 static struct ratelimit_state urandom_warning =
 	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+	RATELIMIT_STATE_INIT("warn_getrandom_notavail", HZ, 3);

 static int ratelimit_disable __read_mostly;

@@ -1053,6 +1055,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 				  urandom_warning.missed);
 			urandom_warning.missed = 0;
 		}
+		if (getrandom_warning.missed) {
+			pr_notice("random: %d getrandom warning(s) missed "
+				  "due to ratelimiting\n",
+				  getrandom_warning.missed);
+			getrandom_warning.missed = 0;
+		}
 	}
 }

@@ -1915,6 +1923,7 @@ int __init rand_initialize(void)
 	crng_global_init_time = jiffies;
 	if (ratelimit_disable) {
 		urandom_warning.interval = 0;
+		getrandom_warning.interval = 0;
 		unseeded_warning.interval = 0;
 	}
 	return 0;
@@ -2138,8 +2147,6 @@ const struct file_operations urandom_fops = {
 SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 		unsigned int, flags)
 {
-	int ret;
-
 	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
 		return -EINVAL;

@@ -2152,9 +2159,13 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
+
+		if (__ratelimit(&getrandom_warning))
+			pr_notice("random: %s: invalid getrandom request "
+				  "(%zd bytes): crng not ready",
+				  current->comm, count);
+
+		return -EINVAL;
 	}
 	return urandom_read(NULL, buf, count, NULL);
 }
--
2.23.0

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool
  2019-09-14 12:25                       ` [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool Ahmed S. Darwish
@ 2019-09-14 14:08                         ` Alexander E. Patrakov
  2019-09-15  5:22                           ` [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized Theodore Y. Ts'o
  0 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-14 14:08 UTC (permalink / raw)
  To: Ahmed S. Darwish, Linus Torvalds
  Cc: Theodore Y. Ts'o, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

(resending without HTML this time, sorry for the duplicate)
14.09.2019 17:25, Ahmed S. Darwish пишет:
> getrandom() has been created as a new and more secure interface for
> pseudorandom data requests.  Unlike /dev/urandom, it unconditionally
> blocks until the entropy pool has been properly initialized.
> 
> While getrandom() has no guaranteed upper bound for its waiting time,
> user-space has been abusing it by issuing the syscall, from shared
> libraries no less, during the main system boot sequence.
> 
> Thus, on certain setups where there is no hwrng (embedded), or the
> hwrng is not trusted by some users (intel RDRAND), or sometimes it's
> just broken (amd RDRAND), the system boot can be *reliably* blocked.
> 
> The issue is further exaggerated by recent file-system optimizations,
> e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
> merges directory lookup code inode table IO, and thus minimizes the
> number of disk interrupts and entropy during boot. After that commit,
> a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
> with standard ArchLinux user-space.
> 
> Thus, don't trust user-space on calling getrandom() from the right
> context. Just never block, and return -EINVAL if entropy is not yet
> available.
> 
> Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
> Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
> Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
> Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org

Let me reword the commit message for a hopefully better historical 
perspective.

===
getrandom() has been created as a new and more secure interface for 
pseudorandom data requests. It attempted to solve two problems, as 
compared to /dev/{u,}random: the need to open a file descriptor (which 
can fail) and possibility to get not-so-random data from the 
incompletely initialized entropy pool. It has succeeded in the first 
improvement, but failed horribly in the second one: it blocks until the 
entropy pool has been properly initialized, if called without 
GRND_NONBLOCK, while none of these behaviors are suitable for the early 
boot stage.

The issue is further exaggerated by recent file-system optimizations, 
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which merges 
directory lookup code inode table IO, and thus minimizes the number of 
disk interrupts and entropy during boot. After that commit, a blocked 
boot can be reliably reproduced on a Thinkpad E480 laptop with standard 
ArchLinux user-space.

Thus, on certain setups where there is no hwrng (embedded systems or 
non-KVM virtual machines), or the hwrng is not trusted by some users 
(intel RDRAND), or sometimes it's just broken (amd RDRAND), the system 
boot can be *reliably* blocked. It can be therefore argued that there is 
no way to use getrandom() on Linux correctly, especially from shared 
libraries: GRND_NONBLOCK has to be used, and a fallback to some other 
interface like /dev/urandom is required, thus making the net result no 
better than just using /dev/urandom unconditionally.

While getrandom() has no guaranteed upper bound for its waiting time, 
user-space has been using it incorrectly by issuing the syscall, from 
shared libraries no less, during the main system boot sequence, without 
GRND_NONBLOCK.

We can't trust user-space on calling getrandom() from the right context. 
Therefore, just never block, and return -EINVAL (with some entropy still 
in the buffer) if the requested amount of entropy is not yet available.

Link: 
https://github.com/openbsd/src/commit/edb2eeb7da8494998d0073f8aaeb8478cee5e00b
Link: 
https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org
===

That said, I have an issue with the -EINVAL return code here: it is also 
returned in cases where the parameters passed are genuinely not 
understood by the kernel, and no entropy has been written to the buffer. 
Therefore, the caller has to assume that the call has failed, waste all 
the bytes in the buffer, and try some fallback strategy. Can we think of 
some other error code?

The other part of me thinks that triggering a fallback, by returning an 
error code, is never the right thing to do. If the "uninitialized" state 
exists at all, applications and libraries have to care (and I would 
expect their authors who don't pass GRND_RANDOM to just fall back to 
/dev/urandom). Therefore, we are back to square one, except that the 
fallback code in the application is something that is only rarely 
exercised, and thus has higher chances to accumulate bugs. Because the 
only expected/reasonable fallback is to read from /dev/urandom, the 
whole result looks like shifting the responsibility/blame without 
achieving anything useful. As the issue is not really solvable, just 
give the application not-so-random data, as /dev/urandom does, without 
any indication - this would at least keep the benefit of not needing a 
file descriptor. It is simply not possible to do anything better without 
eliminating the userspace-visible "uninitialized" crng state, e.g. with 
the help of entropy input from the boot loader or a configurable config 
or command line option to trust the jitter entropy in-kernel.

> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
> ---
> 
> Notes:
>      This feels very risky at the very end of -rc8, so only sending
>      this as an RFC. The system of course reliably boots with this,
>      and the log, as expected, powerfully warns all callers:
> 
>      $ dmesg | grep random
>      [0.236472] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
>      [0.680263] random: fast init done
>      [2.500346] random: lvm: uninitialized urandom read (4 bytes read)
>      [2.595125] random: systemd-random-: invalid getrandom request (512 bytes): crng not ready
>      [2.595126] random: systemd-random-: uninitialized urandom read (512 bytes read)
>      [3.427699] random: dbus-daemon: uninitialized urandom read (12 bytes read)
>      [3.979425] urandom_read: 1 callbacks suppressed
>      [3.979426] random: polkitd: uninitialized urandom read (8 bytes read)
>      [3.979726] random: polkitd: uninitialized urandom read (8 bytes read)
>      [3.979752] random: polkitd: uninitialized urandom read (8 bytes read)
>      [4.473398] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
>      [4.473404] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
>      [4.473409] random: gnome-session-b: invalid getrandom request (16 bytes): crng not ready
>      [5.265636] random: crng init done
>      [5.265649] random: 3 urandom warning(s) missed due to ratelimiting
>      [5.265652] random: 1 getrandom warning(s) missed due to ratelimiting
> 
>   drivers/char/random.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 4a50ee2c230d..309dc5ddf370 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
>   	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
>   static struct ratelimit_state urandom_warning =
>   	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
> +static struct ratelimit_state getrandom_warning =
> +	RATELIMIT_STATE_INIT("warn_getrandom_notavail", HZ, 3);
> 
>   static int ratelimit_disable __read_mostly;
> 
> @@ -1053,6 +1055,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
>   				  urandom_warning.missed);
>   			urandom_warning.missed = 0;
>   		}
> +		if (getrandom_warning.missed) {
> +			pr_notice("random: %d getrandom warning(s) missed "
> +				  "due to ratelimiting\n",
> +				  getrandom_warning.missed);
> +			getrandom_warning.missed = 0;
> +		}
>   	}
>   }
> 
> @@ -1915,6 +1923,7 @@ int __init rand_initialize(void)
>   	crng_global_init_time = jiffies;
>   	if (ratelimit_disable) {
>   		urandom_warning.interval = 0;
> +		getrandom_warning.interval = 0;
>   		unseeded_warning.interval = 0;
>   	}
>   	return 0;
> @@ -2138,8 +2147,6 @@ const struct file_operations urandom_fops = {
>   SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
>   		unsigned int, flags)
>   {
> -	int ret;
> -
>   	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
>   		return -EINVAL;
> 
> @@ -2152,9 +2159,13 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
>   	if (!crng_ready()) {
>   		if (flags & GRND_NONBLOCK)
>   			return -EAGAIN;
> -		ret = wait_for_random_bytes();
> -		if (unlikely(ret))
> -			return ret;
> +
> +		if (__ratelimit(&getrandom_warning))
> +			pr_notice("random: %s: invalid getrandom request "
> +				  "(%zd bytes): crng not ready",
> +				  current->comm, count);
> +
> +		return -EINVAL;
>   	}
>   	return urandom_read(NULL, buf, count, NULL);
>   }
> --
> 2.23.0
> 


-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-12 11:34                     ` Linus Torvalds
  2019-09-12 11:58                       ` Willy Tarreau
  2019-09-14 12:25                       ` [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool Ahmed S. Darwish
@ 2019-09-14 15:02                       ` Ahmed S. Darwish
  2019-09-14 16:30                         ` Linus Torvalds
  2 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-14 15:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> On Thu, Sep 12, 2019 at 9:25 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > Hmm, one thought might be GRND_FAILSAFE, which will wait up to two
> > minutes before returning "best efforts" randomness and issuing a huge
> > massive warning if it is triggered?
> 
> Yeah, based on (by now) _years_ of experience with people mis-using
> "get me random numbers", I think the sense of a new flag needs to be
> "yeah, I'm willing to wait for it".
>
> Because most people just don't want to wait for it, and most people
> don't think about it, and we need to make the default be for that
> "don't think about it" crowd, with the people who ask for randomness
> sources for a secure key having to very clearly and very explicitly
> say "Yes, I understand that this can take minutes and can only be done
> long after boot".
> 
> Even then people will screw that up because they copy code, or some
> less than gifted rodent writes a library and decides "my library is so
> important that I need that waiting sooper-sekrit-secure random
> number", and then people use that broken library by mistake without
> realizing that it's not going to be reliable at boot time.
> 
> An alternative might be to make getrandom() just return an error
> instead of waiting. Sure, fill the buffer with "as random as we can"
> stuff, but then return -EINVAL because you called us too early.
>

ACK, that's probably _the_ most sensible approach. Only caveat is
the slight change in user-space API semantics though...

For example, this breaks the just released systemd-random-seed(8)
as it _explicitly_ requests blocking behvior from getrandom() here:

    => src/random-seed/random-seed.c:
    /*
     * Let's make this whole job asynchronous, i.e. let's make
     * ourselves a barrier for proper initialization of the
     * random pool.
     */
     k = getrandom(buf, buf_size, GRND_NONBLOCK);
     if (k < 0 && errno == EAGAIN && synchronous) {
         log_notice("Kernel entropy pool is not initialized yet, "
                    "waiting until it is.");
                    
         k = getrandom(buf, buf_size, 0); /* retry synchronously */
     }
     if (k < 0) {
         log_debug_errno(errno, "Failed to read random data with "
                         "getrandom(), falling back to "
                         "/dev/urandom: %m");
     } else if ((size_t) k < buf_size) {
         log_debug("Short read from getrandom(), falling back to "
	           "/dev/urandom: %m");
     } else {
         getrandom_worked = true;
     }

Nonetheless, a slightly broken systemd-random-seed, that was just
released only 11 days ago (v243), is honestly much better than a
*non-booting system*...

I've sent an RFC patch at [1].

To handle the systemd case, I'll add the discussed "yeah, I'm
willing to wait for it" flag (GRND_BLOCK) in v2.

If this whole approach is going to be merged, and the slight ABI
breakage is to be tolerated (hmmmmm?), I wonder how will systemd
random-seed handle the semantics change though without doing
ugly kernel version checks..

thanks,

[1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc

--
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14  9:25                     ` Ahmed S. Darwish
@ 2019-09-14 16:27                       ` Theodore Y. Ts'o
  0 siblings, 0 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-14 16:27 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Sat, Sep 14, 2019 at 11:25:09AM +0200, Ahmed S. Darwish wrote:
> Unfortunately, it only made the early fast init faster, but didn't fix
> the normal crng init blockage :-(

Yeah, I see why; the original goal was to do the fast init so that
using /dev/urandom, even before we were fully initialized, wouldn't be
deadly.  But then we still wanted 128 bits of estimated entropy the
old fashioned way before we declare the CRNG initialized.

There are a bunch of things that I think I want to do long-term, such
as make CONFIG_RANDOM_TRUST_CPU the default, trying to get random
entropy from the bootloader, etc.  But none of this is something we
should do in a hurry, especially this close before 5.4 drops.  So I
think I want to fix things this way, which is a bit a of a hack, but I
think it's better than simply reverting commit b03755ad6f33.

Ahmed, Linus, what do you think?

				- Ted

From f1a111bff3b996258410e51a3760fc39bbd7058f Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Sat, 14 Sep 2019 12:21:39 -0400
Subject: [PATCH] ext4: don't plug in __ext4_get_inode_loc if the CRNG is not
 initialized

Unfortuantely commit b03755ad6f33 ("ext4: make __ext4_get_inode_loc
plug") is so effective that on some systems, where RDRAND is not
trusted, and the GNOME display manager is using getrandom(2) to get
randomness for MIT Magic Cookie (which isn't really secure so using
getrandom(2) is a bit of waste) in early boot on an Arch system is
causing the boot to hang.

Since this is causing problems, although arguably this is userspace's
fault, let's not do it if the CRNG is not yet initialized.  This is
better than trying to tweak the random number generator right before
5.4 is released (I'm afraid we'll accidentally make it _too_ weak),
and it's also better than simply completely reverting b03755ad6f33.

We're effectively reverting it while the RNG is not yet initialized,
to slow down the boot and make it less efficient, just to work around
broken init setups.

Fixes: b03755ad6f33 ("ext4: make __ext4_get_inode_loc plug")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/ext4/inode.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4e271b509af1..41ad93f11b6d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4534,6 +4534,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
 	struct buffer_head	*bh;
 	struct super_block	*sb = inode->i_sb;
 	ext4_fsblk_t		block;
+	int			be_inefficient = !rng_is_initialized();
 	struct blk_plug		plug;
 	int			inodes_per_block, inode_offset;
 
@@ -4541,7 +4542,6 @@ static int __ext4_get_inode_loc(struct inode *inode,
 	if (inode->i_ino < EXT4_ROOT_INO ||
 	    inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
 		return -EFSCORRUPTED;
-
 	iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb);
 	gdp = ext4_get_group_desc(sb, iloc->block_group, NULL);
 	if (!gdp)
@@ -4623,7 +4623,8 @@ static int __ext4_get_inode_loc(struct inode *inode,
 		 * If we need to do any I/O, try to pre-readahead extra
 		 * blocks from the inode table.
 		 */
-		blk_start_plug(&plug);
+		if (likely(!be_inefficient))
+			blk_start_plug(&plug);
 		if (EXT4_SB(sb)->s_inode_readahead_blks) {
 			ext4_fsblk_t b, end, table;
 			unsigned num;
@@ -4654,7 +4655,8 @@ static int __ext4_get_inode_loc(struct inode *inode,
 		get_bh(bh);
 		bh->b_end_io = end_buffer_read_sync;
 		submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
-		blk_finish_plug(&plug);
+		if (likely(!be_inefficient))
+			blk_finish_plug(&plug);
 		wait_on_buffer(bh);
 		if (!buffer_uptodate(bh)) {
 			EXT4_ERROR_INODE_BLOCK(inode, block,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 15:02                       ` Linux 5.3-rc8 Ahmed S. Darwish
@ 2019-09-14 16:30                         ` Linus Torvalds
  2019-09-14 16:35                           ` Alexander E. Patrakov
                                             ` (3 more replies)
  0 siblings, 4 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-14 16:30 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> >
> > An alternative might be to make getrandom() just return an error
> > instead of waiting. Sure, fill the buffer with "as random as we can"
> > stuff, but then return -EINVAL because you called us too early.
>
> ACK, that's probably _the_ most sensible approach. Only caveat is
> the slight change in user-space API semantics though...
>
> For example, this breaks the just released systemd-random-seed(8)
> as it _explicitly_ requests blocking behvior from getrandom() here:
>

Actually, I would argue that the "don't ever block, instead fill
buffer and return error instead" fixes this broken case.

>     => src/random-seed/random-seed.c:
>     /*
>      * Let's make this whole job asynchronous, i.e. let's make
>      * ourselves a barrier for proper initialization of the
>      * random pool.
>      */
>      k = getrandom(buf, buf_size, GRND_NONBLOCK);
>      if (k < 0 && errno == EAGAIN && synchronous) {
>          log_notice("Kernel entropy pool is not initialized yet, "
>                     "waiting until it is.");
>
>          k = getrandom(buf, buf_size, 0); /* retry synchronously */
>      }

Yeah, the above is yet another example of completely broken garbage.

You can't just wait and block at boot. That is simply 100%
unacceptable, and always has been, exactly because that may
potentially mean waiting forever since you didn't do anything that
actually is likely to add any entropy.

>      if (k < 0) {
>          log_debug_errno(errno, "Failed to read random data with "
>                          "getrandom(), falling back to "
>                          "/dev/urandom: %m");

At least it gets a log message.

So I think the right thing to do is to just make getrandom() return
-EINVAL, and refuse to block.

As mentioned, this has already historically been a huge issue on
embedded devices, and with disks turnign not just to NVMe but to
actual polling nvdimm/xpoint/flash, the amount of true "entropy"
randomness we can give at boot is very questionable.

We can (and will) continue to do a best-effort thing (including very
much using rdread and friends), but the whole "wait for entropy"
simply *must* stop.

> I've sent an RFC patch at [1].
>
> [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc

Looks reasonable to me. Except I'd just make it simpler and make it a
big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
Make it clear that it is a *bug* if user space thinks it should wait
at boot time.

Also, we might even want to just fill the buffer and return 0 at that
point, to make sure that even more broken user space doesn't then try
to sleep manually and turn it into a "I'll wait myself" loop.

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:30                         ` Linus Torvalds
@ 2019-09-14 16:35                           ` Alexander E. Patrakov
  2019-09-14 16:52                             ` Linus Torvalds
  2019-09-14 21:11                           ` Ahmed S. Darwish
                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-14 16:35 UTC (permalink / raw)
  To: Linus Torvalds, Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, Lennart Poettering,
	lkml


[-- Attachment #1: Type: text/plain, Size: 2354 bytes --]

14.09.2019 21:30, Linus Torvalds пишет:
> On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>>
>> On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
>>>
>>> An alternative might be to make getrandom() just return an error
>>> instead of waiting. Sure, fill the buffer with "as random as we can"
>>> stuff, but then return -EINVAL because you called us too early.
>>
>> ACK, that's probably _the_ most sensible approach. Only caveat is
>> the slight change in user-space API semantics though...
>>
>> For example, this breaks the just released systemd-random-seed(8)
>> as it _explicitly_ requests blocking behvior from getrandom() here:
>>
> 
> Actually, I would argue that the "don't ever block, instead fill
> buffer and return error instead" fixes this broken case.
> 
>>      => src/random-seed/random-seed.c:
>>      /*
>>       * Let's make this whole job asynchronous, i.e. let's make
>>       * ourselves a barrier for proper initialization of the
>>       * random pool.
>>       */
>>       k = getrandom(buf, buf_size, GRND_NONBLOCK);
>>       if (k < 0 && errno == EAGAIN && synchronous) {
>>           log_notice("Kernel entropy pool is not initialized yet, "
>>                      "waiting until it is.");
>>
>>           k = getrandom(buf, buf_size, 0); /* retry synchronously */
>>       }
> 
> Yeah, the above is yet another example of completely broken garbage.
> 
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
> 
>>       if (k < 0) {
>>           log_debug_errno(errno, "Failed to read random data with "
>>                           "getrandom(), falling back to "
>>                           "/dev/urandom: %m");
> 
> At least it gets a log message.
> 
> So I think the right thing to do is to just make getrandom() return
> -EINVAL, and refuse to block.

Let me repeat: not -EINVAL, please. Please find some other error code, 
so that the application could sensibly distinguish between this case 
(low quality entropy is in the buffer) and the "kernel is too dumb" case 
(and no entropy is in the buffer).


-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:35                           ` Alexander E. Patrakov
@ 2019-09-14 16:52                             ` Linus Torvalds
  2019-09-14 17:09                               ` Alexander E. Patrakov
  2019-09-15  6:56                               ` Lennart Poettering
  0 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-14 16:52 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
<patrakov@gmail.com> wrote:
>
> Let me repeat: not -EINVAL, please. Please find some other error code,
> so that the application could sensibly distinguish between this case
> (low quality entropy is in the buffer) and the "kernel is too dumb" case
> (and no entropy is in the buffer).

I'm not convinced we want applications to see that difference.

The fact is, every time an application thinks it cares, it has caused
problems. I can just see systemd saying "ok, the kernel didn't block,
so I'll just do

   while (getrandom(x) == -ENOENTROPY)
       sleep(1);

instead. Which is still completely buggy garbage.

The fact is, we can't guarantee entropy in general. It's probably
there is practice, particularly with user space saving randomness from
last boot etc, but that kind of data may be real entropy, but the
kernel cannot *guarantee* that it is.

And people don't like us guaranteeing that rdrand/rdseed is "real
entropy" either, since they don't trust the CPU hw either.

Which means that we're all kinds of screwed. The whole "we guarantee
entropy" model is broken.

               Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:52                             ` Linus Torvalds
@ 2019-09-14 17:09                               ` Alexander E. Patrakov
  2019-09-14 19:19                                 ` Linus Torvalds
  2019-09-15  6:56                               ` Lennart Poettering
  1 sibling, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-14 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	Lennart Poettering, lkml


[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

14.09.2019 21:52, Linus Torvalds пишет:
> On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
> <patrakov@gmail.com> wrote:
>>
>> Let me repeat: not -EINVAL, please. Please find some other error code,
>> so that the application could sensibly distinguish between this case
>> (low quality entropy is in the buffer) and the "kernel is too dumb" case
>> (and no entropy is in the buffer).
> 
> I'm not convinced we want applications to see that difference.
> 
> The fact is, every time an application thinks it cares, it has caused
> problems. I can just see systemd saying "ok, the kernel didn't block,
> so I'll just do
> 
>     while (getrandom(x) == -ENOENTROPY)
>         sleep(1);
> 
> instead. Which is still completely buggy garbage.

OK, I understand this viewpoint. But then still, -EINVAL is not the 
answer, because a hypothetical evil version of systemd will use -EINVAL 
as -ENOENTROPY (with flags == 0 and a reasonable buffer size, there is 
simply no other reason for the kernel to return -EINVAL). Yes I 
understand that this is a complete reverse of my previous argument.

> The fact is, we can't guarantee entropy in general. It's probably
> there is practice, particularly with user space saving randomness from
> last boot etc, but that kind of data may be real entropy, but the
> kernel cannot *guarantee* that it is.
> 
> And people don't like us guaranteeing that rdrand/rdseed is "real
> entropy" either, since they don't trust the CPU hw either.
> 
> Which means that we're all kinds of screwed. The whole "we guarantee
> entropy" model is broken.

I agree here. Given that you suggested "to just fill the buffer and 
return 0" in the previous mail (well, I think you really meant "return 
buflen", otherwise ENOENTROPY == 0 and your previous objection applies), 
let's do just that. As a bonus, it saves applications from the complex 
dance with retrying via /dev/urandom and finally brings a reliable API 
(modulo old and broken kernels) to get random numbers (well, as random 
as possible right now) without needing a file descriptor.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 17:09                               ` Alexander E. Patrakov
@ 2019-09-14 19:19                                 ` Linus Torvalds
  0 siblings, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-14 19:19 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 10:09 AM Alexander E. Patrakov
<patrakov@gmail.com> wrote:
>
> > Which means that we're all kinds of screwed. The whole "we guarantee
> > entropy" model is broken.
>
> I agree here. Given that you suggested "to just fill the buffer and
> return 0" in the previous mail (well, I think you really meant "return
> buflen", otherwise ENOENTROPY == 0 and your previous objection applies),

Right.

The question remains when we should WARN_ON(), though.

For example, if somebody did save entropy between boots, we probably
should accept that - at least in the sense of not warning when they
then ask for randomness data back.

And if the hardware does have a functioning rdrand, we probably should
accept that too - simply because not accepting it and warning sounds a
bit too annoying.

But we definitely *should* have a warning for people who build
embedded devices that we can't see any reasonable amount of possible
entropy. Those have definitely happened, and it's a serious and real
security issue.

> let's do just that. As a bonus, it saves applications from the complex
> dance with retrying via /dev/urandom and finally brings a reliable API
> (modulo old and broken kernels) to get random numbers (well, as random
> as possible right now) without needing a file descriptor.

Yeah, well, the question in the end always is "what is reliable".

Waiting has definitely not been reliable, and has only ever caused problems.

Returning an error (or some status while still doing a best effort)
would be reasonable, but I really do think that people will mis-use
that. We just have too much of a history of people having the mindset
that they can just fall back to something better - like waiting - and
they are always wrong.

Just returning random data is the right thing, but we do need to make
sure that system developers see a warning if they do something
obviously wrong (so that the embedded people without even a real-time
clock to initialize any bits of entropy AT ALL won't think that they
can generate a system key on their router).

               Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:30                         ` Linus Torvalds
  2019-09-14 16:35                           ` Alexander E. Patrakov
@ 2019-09-14 21:11                           ` Ahmed S. Darwish
  2019-09-14 22:05                             ` Martin Steigerwald
  2019-09-14 22:24                             ` Theodore Y. Ts'o
  2019-09-15  6:51                           ` Lennart Poettering
  2019-09-23 20:49                           ` chaos generating driver was " Pavel Machek
  3 siblings, 2 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-14 21:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

Hi,

On Sat, Sep 14, 2019 at 09:30:19AM -0700, Linus Torvalds wrote:
> On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> >
> > On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> > >
> > > An alternative might be to make getrandom() just return an error
> > > instead of waiting. Sure, fill the buffer with "as random as we can"
> > > stuff, but then return -EINVAL because you called us too early.
> >
> > ACK, that's probably _the_ most sensible approach. Only caveat is
> > the slight change in user-space API semantics though...
> >
> > For example, this breaks the just released systemd-random-seed(8)
> > as it _explicitly_ requests blocking behvior from getrandom() here:
> >
> 
> Actually, I would argue that the "don't ever block, instead fill
> buffer and return error instead" fixes this broken case.
> 
> >     => src/random-seed/random-seed.c:
> >     /*
> >      * Let's make this whole job asynchronous, i.e. let's make
> >      * ourselves a barrier for proper initialization of the
> >      * random pool.
> >      */
> >      k = getrandom(buf, buf_size, GRND_NONBLOCK);
> >      if (k < 0 && errno == EAGAIN && synchronous) {
> >          log_notice("Kernel entropy pool is not initialized yet, "
> >                     "waiting until it is.");
> >
> >          k = getrandom(buf, buf_size, 0); /* retry synchronously */
> >      }
> 
> Yeah, the above is yet another example of completely broken garbage.
> 
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
>

ACK, the systemd commit which introduced that code also does:

   => 26ded5570994 (random-seed: rework systemd-random-seed.service..)
    [...]
    --- a/units/systemd-random-seed.service.in
    +++ b/units/systemd-random-seed.service.in
    @@ -22,4 +22,9 @@ Type=oneshot
    RemainAfterExit=yes
    ExecStart=@rootlibexecdir@/systemd-random-seed load
    ExecStop=@rootlibexecdir@/systemd-random-seed save
   -TimeoutSec=30s
   +
   +# This service waits until the kernel's entropy pool is
   +# initialized, and may be used as ordering barrier for service
   +# that require an initialized entropy pool. Since initialization
   +# can take a while on entropy-starved systems, let's increase the
   +# time-out substantially here.
   +TimeoutSec=10min

This 10min wait thing is really broken... it's basically "forever".

> >      if (k < 0) {
> >          log_debug_errno(errno, "Failed to read random data with "
> >                          "getrandom(), falling back to "
> >                          "/dev/urandom: %m");
> 
> At least it gets a log message.
> 
> So I think the right thing to do is to just make getrandom() return
> -EINVAL, and refuse to block.
> 
> As mentioned, this has already historically been a huge issue on
> embedded devices, and with disks turnign not just to NVMe but to
> actual polling nvdimm/xpoint/flash, the amount of true "entropy"
> randomness we can give at boot is very questionable.
>

ACK.

Moreover, and as a result of all that, distributions are now officially
*duct-taping* the problem:

    https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html#entropy-starvation

    5.1.4. Daemons fail to start or system appears to hang during boot
  
    Due to systemd needing entropy during boot and the kernel treating
    such calls as blocking when available entropy is low, the system
    may hang for minutes to hours until the randomness subsystem is
    sufficiently initialized (random: crng init done).

"the system may hang for minuts to hours"...

> We can (and will) continue to do a best-effort thing (including very
> much using rdread and friends), but the whole "wait for entropy"
> simply *must* stop.
> 
> > I've sent an RFC patch at [1].
> >
> > [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
> 
> Looks reasonable to me. Except I'd just make it simpler and make it a
> big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
> Make it clear that it is a *bug* if user space thinks it should wait
> at boot time.
> 
> Also, we might even want to just fill the buffer and return 0 at that
> point, to make sure that even more broken user space doesn't then try
> to sleep manually and turn it into a "I'll wait myself" loop.
>

ACK, I'll send an RFC v2, returning buflen, and so on..

/me will enjoy the popcorn from all the to-be-reported WARN_ON()s
on distribution mailing lists ;-)

>                  Linus

thanks,

-- 
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 21:11                           ` Ahmed S. Darwish
@ 2019-09-14 22:05                             ` Martin Steigerwald
  2019-09-14 22:24                             ` Theodore Y. Ts'o
  1 sibling, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-14 22:05 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, Lennart Poettering, lkml

Ahmed S. Darwish - 14.09.19, 23:11:26 CEST:
> > Yeah, the above is yet another example of completely broken garbage.
> > 
> > You can't just wait and block at boot. That is simply 100%
> > unacceptable, and always has been, exactly because that may
> > potentially mean waiting forever since you didn't do anything that
> > actually is likely to add any entropy.
> 
> ACK, the systemd commit which introduced that code also does:
> 
>    => 26ded5570994 (random-seed: rework systemd-random-seed.service..)
> [...]
>     --- a/units/systemd-random-seed.service.in
>     +++ b/units/systemd-random-seed.service.in
>     @@ -22,4 +22,9 @@ Type=oneshot
>     RemainAfterExit=yes
>     ExecStart=@rootlibexecdir@/systemd-random-seed load
>     ExecStop=@rootlibexecdir@/systemd-random-seed save
>    -TimeoutSec=30s
>    +
>    +# This service waits until the kernel's entropy pool is
>    +# initialized, and may be used as ordering barrier for service
>    +# that require an initialized entropy pool. Since initialization
>    +# can take a while on entropy-starved systems, let's increase the
>    +# time-out substantially here.
>    +TimeoutSec=10min
> 
> This 10min wait thing is really broken... it's basically "forever".

I am so happy to use Sysvinit on my systems again. Depending on entropy 
for just booting a machine is broken¹.

Of course regenerating SSH keys on boot, probably due to cloud-init 
replacing the old key after a VM has been cloned from template, may 
still be a challenge to handle well². I'd probably replace SSH keys in 
the background and restart the service then, but this may lead to 
spurious man in the middle warnings.


[1] Debian Buster release notes: 5.1.4. Daemons fail to start or system 
appears to hang during boot

https://www.debian.org/releases/stable/amd64/release-notes/ch-information.en.html#entropy-starvation

[2] Openssh taking minutes to become available, booting takes half an 
hour ... because your server waits for a few bytes of randomness

https://daniel-lange.com/archives/152-hello-buster.html

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 21:11                           ` Ahmed S. Darwish
  2019-09-14 22:05                             ` Martin Steigerwald
@ 2019-09-14 22:24                             ` Theodore Y. Ts'o
  2019-09-14 22:32                               ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-14 22:24 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 11:11:26PM +0200, Ahmed S. Darwish wrote:
> > > I've sent an RFC patch at [1].
> > >
> > > [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
> > 
> > Looks reasonable to me. Except I'd just make it simpler and make it a
> > big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
> > Make it clear that it is a *bug* if user space thinks it should wait
> > at boot time.

So I'd really rather not make a change as fundamental as this so close
to 5.3 being released.  This sort of thing is subtle since essentially
what we're trying to do is to work around broken userspace, and worse,
in many cases, obstinent, determined userspace application progammers.
We've told them to avoid trying to generate cryptographically secure
random numbers for *years*.  And they haven't listened.

This is also a fairly major functional change which is likely to be
very visible to userspace applications, and so it is likely to cause
*some* kind of breakage.  So if/when this breaks applications, are we
going to then have to revert it?

> > Also, we might even want to just fill the buffer and return 0 at that
> > point, to make sure that even more broken user space doesn't then try
> > to sleep manually and turn it into a "I'll wait myself" loop.

Ugh.  This makes getrandom(2) unreliable for application programers,
in that it returns success, but with the buffer filled with something
which is definitely not random.  Please, let's not.

Worse, it won't even accomplish something against an obstinant
programmers.  Someone who is going to change their program to sleep
loop waiting for getrandom(2) to not return with an error can just as
easily check for a buffer which is zero-filled, or an unchanged
buffer, and then sleep loop on that.  Again, remember we're trying to
work around malicious human beings --- except instead trying to fight
malicious attackers, we're trying to fight malicious userspace
programmers.  This is not a fight we can win.  We can't make
getrandom(2) idiot-proof, because idiots are too d*mned ingenious :-)

For 5.3, can we please consider my proposal in [1]?

[1] https://lore.kernel.org/linux-ext4/20190914162719.GA19710@mit.edu/

We can try to discuss different ways of working around broken/stupid
userspace, but let's wait until after the LTS release, and ultimately,
I still think we need to just try to get more randomness from hardware
whichever way we can.  Pretty much all x86 laptop/desktop have TPM's.
So let's use that, in combination with RDRAND, and UEFI provided
randomness, etc., etc.,

And if we want to put in a big WARN_ON_ONCE, sure.  But we've tried
not blocking before, and that way didn't end well[2], with almost 10%
of all publically accessible SSH keys across the entire internet being
shown to be week by an academic researcher.  (This ruined my July 4th
holidays in 2012 when I was working on patches to fix this on very
short notice.)  So let's *please* not be hasty with changes here.
We're dealing with a complex systems that includes some very
obstinent/strong personalities, including one which rhymes with
Loettering....

[2] https://factorable.net

						- Ted


^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 22:24                             ` Theodore Y. Ts'o
@ 2019-09-14 22:32                               ` Linus Torvalds
  2019-09-15  1:00                                 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-14 22:32 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 3:24 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> > > Also, we might even want to just fill the buffer and return 0 at that
> > > point, to make sure that even more broken user space doesn't then try
> > > to sleep manually and turn it into a "I'll wait myself" loop.
>
> Ugh.  This makes getrandom(2) unreliable for application programers,
> in that it returns success, but with the buffer filled with something
> which is definitely not random.  Please, let's not.

You misunderstand,

The buffer would always be filled with "as random as we can make it".
My "return zero" was for success, but Alexander pointed out that the
return value is the length, not "zero for success".

> Worse, it won't even accomplish something against an obstinant
> programmers.  Someone who is going to change their program to sleep
> loop waiting for getrandom(2) to not return with an error can just as
> easily check for a buffer which is zero-filled, or an unchanged
> buffer, and then sleep loop on that.

Again,  no they can't. They'll get random data in the buffer. And
there is no way they can tell how much entropy that random data has.
Exactly the same way there is absolutely no way _we_ can tell how much
entropy we have.

> For 5.3, can we please consider my proposal in [1]?
>
> [1] https://lore.kernel.org/linux-ext4/20190914162719.GA19710@mit.edu/

Honestly, to me that looks *much* worse than just saying that we need
to stop allowing insane user mode boot programs make insane choices
that have no basis in reality.

It may be the safest thing to do, but at that point we might as well
just revert the ext4 change entirely. I'd rather do that, than h ave
random filesystems start making random decisions based on crazy user
space behavior.

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 22:32                               ` Linus Torvalds
@ 2019-09-15  1:00                                 ` Theodore Y. Ts'o
  2019-09-15  1:10                                   ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-15  1:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 03:32:46PM -0700, Linus Torvalds wrote:
> > Worse, it won't even accomplish something against an obstinant
> > programmers.  Someone who is going to change their program to sleep
> > loop waiting for getrandom(2) to not return with an error can just as
> > easily check for a buffer which is zero-filled, or an unchanged
> > buffer, and then sleep loop on that.
> 
> Again,  no they can't. They'll get random data in the buffer. And
> there is no way they can tell how much entropy that random data has.

That makes me even more worried.  It's probably going to be OK for
modern x86 systems, since "best we can do" will include RDRAND
(whether or not it's trusted).  But on systems without something like
RDRAND --- e.g., ARM --- the "best we can do" could potentially be
Really Bad.  Again, look back at the Mining Your P's and Q's paper
from factorable.net.

If we don't block, and we just return "the best we can do", and some
insane userspace tries to generate a long-term private key (for SSH or
TLS) in super-early boot, I think we owe them something beyond a big
fat WARN_ON_ONCE.  We could return 0 for success, and yet "the best we
can do" could be really terrible.

> > For 5.3, can we please consider my proposal in [1]?
> >
> > [1] https://lore.kernel.org/linux-ext4/20190914162719.GA19710@mit.edu/
> 
> Honestly, to me that looks *much* worse than just saying that we need
> to stop allowing insane user mode boot programs make insane choices
> that have no basis in reality.
> 
> It may be the safest thing to do, but at that point we might as well
> just revert the ext4 change entirely. I'd rather do that, than have
> random filesystems start making random decisions based on crazy user
> space behavior.

All we're doing is omitting the plug; I disagree that it's really all
that random.  Honestly, I'd much rather just let distributions hang,
and force them to fix it that way.  That's *much* better than silently
give them "the best we can do", which might be "not really random at
all".

The reality is that there will be some platforms where we will block
for a very long time, given certain kernel configs and certain really
stupid userspace decisions --- OR, we can open up a really massive
security hole given stupid userspace decisions.  Ext4 just got unlocky
that a performance improvement patch happened to toggle one or two
configurations from "working" to "not working".   

But just saying, "oh well" and returning something which might not
really be random with a success code is SUCH A TERRIBLE IDEA, that if
you really prefer that, I'll accept the ext4 revert, even though I
don't think we should be penalizing all ext4 performance just because
of a few distros being stupid.

If the choice is between that and making some unsuspecting
distributions being potentially completely insecure, it's no contest.
I won't have that on my conscience.

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  1:00                                 ` Theodore Y. Ts'o
@ 2019-09-15  1:10                                   ` Linus Torvalds
  2019-09-15  2:05                                     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15  1:10 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 6:00 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> That makes me even more worried.  It's probably going to be OK for
> modern x86 systems, since "best we can do" will include RDRAND
> (whether or not it's trusted).  But on systems without something like
> RDRAND --- e.g., ARM --- the "best we can do" could potentially be
> Really Bad.  Again, look back at the Mining Your P's and Q's paper
> from factorable.net.

Yes. And they had that problem *because* the blocking interface was
useless, and they didn't use it, and *because* nobody warned them
about it.

In other words, the whole disaster was exactly because blocking is
wrong, and because blocking to get "secure" data is unacceptable.

And the random people DIDN'T LEARN A SINGLE LESSON from that thing.

Seriously. getrandom() introduced the same broken model as /dev/random
had - and that then caused people to use /dev/urandom instead.

And now it has shown itself to be broken _again_.

And you still argue against the only sane model. Scream loudly that
you're doing something wrong so that people can fix their broken
garbage, but don't let people block, which is _also_ broken garbage.

Seriously. Blocking is wrong. Blocking has _always_ been wrong. It was
why /dev/random was useless, and it is now why the new getrandom()
system call is showing itself useless.

> We could return 0 for success, and yet "the best we
> can do" could be really terrible.

Yes. Which is why we should warn.

But we can't *block*. Because that just breaks people. Like shown in
this whole discussion.

Why is warning different? Because hopefully it tells the only person
who can *do* something about it - the original maintainer or developer
of the user space tools - that they are doing something wrong and need
to fix their broken model.

Blocking doesn't do that. Blocking only makes the system unusable. And
yes, some security people think "unusable == secure", but honestly,
those security people shouldn't do system design. They are the worst
kind of "technically correct" incompetent.

> > > For 5.3, can we please consider my proposal in [1]?
> > It may be the safest thing to do, but at that point we might as well
> > just revert the ext4 change entirely. I'd rather do that, than have
> > random filesystems start making random decisions based on crazy user
> > space behavior.
>
> All we're doing is omitting the plug;

Yes. Which we'll do by reverting that change. I agree that it's the
safe thing to do for 5.3.

We are not adding crazy workarounds for "getrandom()" bugs in some
low-level filesystem.

Either we fix getrandom() or we revert the change. We don't do some
mis-designed "let's work around bugs in getrandom() in the ext4
filesystem with ad-hoc behavioral changes".

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  1:10                                   ` Linus Torvalds
@ 2019-09-15  2:05                                     ` Theodore Y. Ts'o
  2019-09-15  2:11                                       ` Linus Torvalds
                                                         ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-15  2:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 06:10:47PM -0700, Linus Torvalds wrote:
> > We could return 0 for success, and yet "the best we
> > can do" could be really terrible.
> 
> Yes. Which is why we should warn.

I'm all in favor of warning.  But people might just ignore the
warning.  We warn today about systemd trying to read from /dev/urandom
too early, and that just gets ignored.

> But we can't *block*. Because that just breaks people. Like shown in
> this whole discussion.

I'd be willing to let it take at least 2 minutes, since that's slow
enough to be annoying.  I'd be willing to to kill the process which
tried to call getrandom too early.  But I believe blocking is better
than returning something potentially not random at all.  I think
failing "safe" is extremely important.  And returning something not
random which then gets used for a long-term private key is a disaster.

You basically want to turn getrandom into /dev/urandom.  And that's
how we got into the mess where 10% of the publically accessible ssh
keys could be guessed.  I've tried that already, and we saw how that
ended.

> Why is warning different? Because hopefully it tells the only person
> who can *do* something about it - the original maintainer or developer
> of the user space tools - that they are doing something wrong and need
> to fix their broken model.

Except the developer could (and *has) just ignored the warning, which
is what happened with /dev/urandom when it was accessed too early.
Even when I drew some developers attention to the warning, at least
one just said, "meh", and blew me off.  Would a making it be noiser
(e.g., a WARN_ON) make enough of a difference?  I guess I'm just not
convinced.

> Blocking doesn't do that. Blocking only makes the system unusable. And
> yes, some security people think "unusable == secure", but honestly,
> those security people shouldn't do system design. They are the worst
> kind of "technically correct" incompetent.

Which is worse really depends on your point of view, and what the
system might be controlling.  If access to the system could cause a
malicious attacker to trigger a nuclear bomb, failing safe is always
going to be better.  In other cases, maybe failing open is certainly
more convenient.  It certainly leaves the system more "usable".  But
how do we trade off "usable" with "insecure"?  There are times when
"unusable" is WAY better than "could risk life or human safety".

Would you be willing to settle for a CONFIG option or a boot-command
line option which controls whether we fail "safe" or fail "open" if
someone calls getrandom(2) and there isn't enough entropy?  Then each
distribution and/or system integrator can decide whether "proper
systems design" considers "usability" versus "must not fail
insecurely" to be more important.   

	       	      	   	       	    - Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  2:05                                     ` Theodore Y. Ts'o
@ 2019-09-15  2:11                                       ` Linus Torvalds
  2019-09-15  6:33                                       ` Willy Tarreau
  2019-09-15  6:53                                       ` Willy Tarreau
  2 siblings, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15  2:11 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 7:05 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> I'd be willing to let it take at least 2 minutes, since that's slow
> enough to be annoying.

Have you ever met a real human being?

A boot that blocks will result in people pressing the big red button
in less than 30 seconds, unless it talks very much about _why_ it
blocks and gives an estimate of how long.

Please go out and actually interact with real people some day.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-14 14:08                         ` Alexander E. Patrakov
@ 2019-09-15  5:22                           ` Theodore Y. Ts'o
  2019-09-15  8:17                             ` [PATCH RFC v3] random: getrandom(2): optionally block when " Ahmed S. Darwish
  2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
  0 siblings, 2 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-15  5:22 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Ahmed S. Darwish, Linus Torvalds, Michael Kerrisk,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

getrandom() has been created as a new and more secure interface for
pseudorandom data requests.  Unlike /dev/urandom, it unconditionally
blocks until the entropy pool has been properly initialized.

While getrandom() has no guaranteed upper bound for its waiting time,
user-space has been abusing it by issuing the syscall, from shared
libraries no less, during the main system boot sequence.

Thus, on certain setups where there is no hwrng (embedded), or the
hwrng is not trusted by some users (intel RDRAND), or sometimes it's
just broken (amd RDRAND), the system boot can be *reliably* blocked.

The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which
merges directory lookup code inode table IO, and thus minimizes the
number of disk interrupts and entropy during boot. After that commit,
a blocked boot can be reliably reproduced on a Thinkpad E480 laptop
with standard ArchLinux user-space.

Thus, add an optional configuration option which stops getrandom(2)
from blocking, but instead returns "best efforts" randomness, which
might not be random or secure at all.  This can be controlled via
random.getrandom_block boot command line option, and the
CONFIG_RANDOM_BLOCK can be used to set the default to be blocking.
Since according to the Great Penguin, only incompetent system
designers would value "security" ahead of "usability", the default is
to be non-blocking.

In addition, modify getrandom(2) to complain loudly with a kernel
warning when some userspace process is erroneously calling
getrandom(2) too early during the boot process.

Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org

[ Modified by tytso@mit.edu to make the change of getrandom(2) to be
  non-blocking to be optional. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---

Here's my take on the patch.  I really very strongly believe that the
idea of making getrandom(2) non-blocking and to blindly assume that we
can load up the buffer with "best efforts" randomness to be a
terrible, terrible idea that is going to cause major security problems
that we will potentially regret very badly.  Linus Torvalds believes I
am an incompetent systems designer.

So let's do it both ways, and push the decision on the distributor
and/or product manufacturer

 drivers/char/Kconfig  | 33 +++++++++++++++++++++++++++++++--
 drivers/char/random.c | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3e866885a405..337baeca5ebc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -557,8 +557,6 @@ config ADI
 	  and SSM (Silicon Secured Memory).  Intended consumers of this
 	  driver include crash and makedumpfile.
 
-endmenu
-
 config RANDOM_TRUST_CPU
 	bool "Trust the CPU manufacturer to initialize Linux's CRNG"
 	depends on X86 || S390 || PPC
@@ -573,3 +571,34 @@ config RANDOM_TRUST_CPU
 	has not installed a hidden back door to compromise the CPU's
 	random number generation facilities. This can also be configured
 	at boot with "random.trust_cpu=on/off".
+
+config RANDOM_BLOCK
+	bool "Block if getrandom is called before CRNG is initialized"
+	help
+	  Say Y here if you want userspace programs which call
+	  getrandom(2) before the Cryptographic Random Number
+	  Generator (CRNG) is initialized to block until
+	  secure random numbers are available.
+
+	  Say N if you believe usability is more important than
+	  security, so if getrandom(2) is called before the CRNG is
+	  initialized, it should not block, but instead return "best
+	  effort" randomness which might not be very secure or random
+	  at all; but at least the system boot will not be delayed by
+	  minutes or hours.
+
+	  This can also be controlled at boot with
+	  "random.getrandom_block=on/off".
+
+	  Ideally, systems would be configured with hardware random
+	  number generators, and/or configured to trust CPU-provided
+	  RNG's.  In addition, userspace should generate cryptographic
+	  keys only as late as possible, when they are needed, instead
+	  of during early boot.  (For non-cryptographic use cases,
+	  such as dictionary seeds or MIT Magic Cookies, other
+	  mechanisms such as /dev/urandom or random(3) may be more
+	  appropropriate.)  This config option controls what the
+	  kernel should do as a fallback when the non-ideal case
+	  presents itself.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..243fb4a4535f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
 	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
 static struct ratelimit_state urandom_warning =
 	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+	RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3);
 
 static int ratelimit_disable __read_mostly;
 
@@ -854,12 +856,19 @@ static void invalidate_batched_entropy(void);
 static void numa_crng_init(void);
 
 static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK);
 static int __init parse_trust_cpu(char *arg)
 {
 	return kstrtobool(arg, &trust_cpu);
 }
 early_param("random.trust_cpu", parse_trust_cpu);
 
+static int __init parse_block(char *arg)
+{
+	return kstrtobool(arg, &getrandom_block);
+}
+early_param("random.getrandom_block", parse_block);
+
 static void crng_initialize(struct crng_state *crng)
 {
 	int		i;
@@ -1045,6 +1054,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 				  urandom_warning.missed);
 			urandom_warning.missed = 0;
 		}
+		if (getrandom_warning.missed) {
+			pr_notice("random: %d getrandom warning(s) missed "
+				  "due to ratelimiting\n",
+				  getrandom_warning.missed);
+			getrandom_warning.missed = 0;
+		}
 	}
 }
 
@@ -1900,6 +1915,7 @@ int __init rand_initialize(void)
 	crng_global_init_time = jiffies;
 	if (ratelimit_disable) {
 		urandom_warning.interval = 0;
+		getrandom_warning.interval = 0;
 		unseeded_warning.interval = 0;
 	}
 	return 0;
@@ -1969,8 +1985,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	if (!crng_ready() && maxwarn > 0) {
 		maxwarn--;
 		if (__ratelimit(&urandom_warning))
-			printk(KERN_NOTICE "random: %s: uninitialized "
-			       "urandom read (%zd bytes read)\n",
+			pr_err("random: %s: CRNG uninitialized "
+			       "(%zd bytes read)\n",
 			       current->comm, nbytes);
 		spin_lock_irqsave(&primary_crng.lock, flags);
 		crng_init_cnt = 0;
@@ -2135,9 +2151,17 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
+		WARN_ON_ONCE(1);
+		if (getrandom_block) {
+			if (__ratelimit(&getrandom_warning))
+				pr_err("random: %s: getrandom blocking for CRNG initialization\n",
+				       current->comm);
+			ret = wait_for_random_bytes();
+			if (unlikely(ret))
+				return ret;
+		} else if (__ratelimit(&getrandom_warning))
+			pr_err("random: %s: getrandom called too early\n",
+			       current->comm);
 	}
 	return urandom_read(NULL, buf, count, NULL);
 }
-- 
2.23.0


^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  2:05                                     ` Theodore Y. Ts'o
  2019-09-15  2:11                                       ` Linus Torvalds
@ 2019-09-15  6:33                                       ` Willy Tarreau
  2019-09-15  6:53                                       ` Willy Tarreau
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15  6:33 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Ahmed S. Darwish, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 10:05:21PM -0400, Theodore Y. Ts'o wrote:
> I'd be willing to let it take at least 2 minutes, since that's slow
> enough to be annoying.

It's an eternity, and prevents a backup system from being turned on in
time to replace a dead system. In fact the main problem with this is
that it destroys uptime on already configured systems for the sake of
making sure a private SSH key is produce correctly. It turns out that
if we instead give the info to this tool that the produced random is
not strong, this only tool that requires good entropy will be able to
ask the user to type something to add real entropy. But making the
system wait forever will not bring any extra entropy because the
services cannot start, it will not even receive network traffic and
will not be able to collect entropy. Sorry Ted, but I've been hit by
this already. It's a real problem to see a system not finish to boot
after a crash when you know your systems have only 5 minutes of total
downtime allowed per year (5 nines). And when the SSH keys, like the
rest of the config, were supposed to be either synchronized from the
network or pre-populated in a system image, nobody finds this a valid
justification for an extended downtime.

> Except the developer could (and *has) just ignored the warning, which
> is what happened with /dev/urandom when it was accessed too early.

That's why it's nice to have getrandom() return the error : it will
for once allow the developer of the program to care depending on the
program. Those proposing to choose the pieces to present in Tetris
will not care, those trying to generate an SSH key will care and will
have solid and well known fallbacks. And the rare ones who need good
randoms and ignore the error will be the ones *responsible* for this,
it will not be the kernel anymore giving bad random.

BTW I was thinking that EAGAIN was semantically better than EINVAL to
indicate that the same call should be done with blocking.

Just my two cents,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:30                         ` Linus Torvalds
  2019-09-14 16:35                           ` Alexander E. Patrakov
  2019-09-14 21:11                           ` Ahmed S. Darwish
@ 2019-09-15  6:51                           ` Lennart Poettering
  2019-09-15  7:27                             ` Ahmed S. Darwish
  2019-09-15 16:29                             ` Linus Torvalds
  2019-09-23 20:49                           ` chaos generating driver was " Pavel Machek
  3 siblings, 2 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  6:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Sa, 14.09.19 09:30, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> >     => src/random-seed/random-seed.c:
> >     /*
> >      * Let's make this whole job asynchronous, i.e. let's make
> >      * ourselves a barrier for proper initialization of the
> >      * random pool.
> >      */
> >      k = getrandom(buf, buf_size, GRND_NONBLOCK);
> >      if (k < 0 && errno == EAGAIN && synchronous) {
> >          log_notice("Kernel entropy pool is not initialized yet, "
> >                     "waiting until it is.");
> >
> >          k = getrandom(buf, buf_size, 0); /* retry synchronously */
> >      }
>
> Yeah, the above is yet another example of completely broken garbage.
>
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.

Oh man. Just spend 5min to understand the situation, before claiming
this was garbage or that was garbage. The code above does not block
boot. It blocks startup of services that explicit order themselves
after the code above. There's only a few services that should do that,
and the main system boots up just fine without waiting for this.

Primary example for stuff that orders itself after the above,
correctly: cryptsetup entries that specify /dev/urandom as password
source (i.e. swap space and stuff, that wants a new key on every
boot). If we don't wait for the initialized pool for cases like that
the password for that swap space is not actually going to be random,
and that defeats its purpose.

Another example: the storing of an updated random seed file on
disk. We should only do that if the seed on disk is actually properly
random, i.e. comes from an initialized pool. Hence we wait for the
pool to be initialized before reading the seed from the pool, and
writing it to disk.

I'd argue that doing things like this is not "garbage", like you say,
but *necessary* to make this stuff safe and secure.

And no, other stuff is not delayed for this (but there are bugs of
course, some random services in 3rd party packages that set too
agressive deps, but that needs to be fixed there, and not in the
kernel).

Anyway, I really don't appreciate your tone, and being sucked into
messy LKML discussions. I generally stay away from LKML, and gah, you
remind me why. Just tone it down, not everything you never bothered to
understand is "garbage".

And please don't break /dev/urandom again. The above code is the ony
way I see how we can make /dev/urandom-derived swap encryption safe,
and the only way I can see how we can sanely write a valid random seed
to disk after boot. You guys changed semantics on /dev/urandom all the
time in the past, don't break API again, thank you very much.

Lennart

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  2:05                                     ` Theodore Y. Ts'o
  2019-09-15  2:11                                       ` Linus Torvalds
  2019-09-15  6:33                                       ` Willy Tarreau
@ 2019-09-15  6:53                                       ` Willy Tarreau
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15  6:53 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Ahmed S. Darwish, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, Lennart Poettering, lkml

On Sat, Sep 14, 2019 at 10:05:21PM -0400, Theodore Y. Ts'o wrote:
> You basically want to turn getrandom into /dev/urandom.  And that's
> how we got into the mess where 10% of the publically accessible ssh
> keys could be guessed.

Not exactly. This was an *API* issue that created this situation. The
fact that you had a single random() call in the libc, either mapped
to /dev/urandom or to /dev/random. By then many of us were used to rely
on one or the other and finding systems where /dev/random was a symlink
to /dev/urandom to avoid blocking was extremely common. In fact it was
caused by the exact same situation: we try to enforce good random for
everyone, it cannot work all the time and breaks programs which do not
need such randoms, so the user breaks the trust on randomness by
configuring the system so that randoms work all the time for the most
common programs. And that's how you end up with SSH trusting a broken
random generator without knowing it was misconfigured.

Your getrandom() API does have the ability to fix this. In my opinion
the best way to proceed is to consider that all those who don't care
about randomness quality never block and that those who care can be
sure they will either get good randoms or will know about it. Ideally
calling getrandom() without any flag should be equivalent to what you
have with /dev/urandom and be good enough to put a UUID on a file
system. And calling it with "SECURE" or something like this will be
the indication that it will not betray you and will only return good
randoms (which is what GRND_RANDOM does in my opinion).

The huge difference between getrandom() and /dev/*random here is that
each application can decide what type of random to use without relying
on what system-wide breakage was applied just for the sake of fixing
another simple application. This could even help OpenSSL use two different
calls for RAND_bytes() and RAND_pseudo_bytes(), instead of using the
same call and blocking.

Last but not least, I think we need to educate developers regarding
random number consumption, asking "if you could produce only 16 bytes
of random in your whole system's lifetime, where would you use them?".
Entropy is extremely precious and yet the most poorly used resource. I
almost wouldn't mind seeing GRND_RANDOM requiring a special capability
since it does have a system-wide impact!

Regards,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-14 16:52                             ` Linus Torvalds
  2019-09-14 17:09                               ` Alexander E. Patrakov
@ 2019-09-15  6:56                               ` Lennart Poettering
  2019-09-15  7:01                                 ` Willy Tarreau
  2019-09-15 17:02                                 ` Linus Torvalds
  1 sibling, 2 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  6:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander E. Patrakov, Ahmed S. Darwish, Theodore Y. Ts'o,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sa, 14.09.19 09:52, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Sat, Sep 14, 2019 at 9:35 AM Alexander E. Patrakov
> <patrakov@gmail.com> wrote:
> >
> > Let me repeat: not -EINVAL, please. Please find some other error code,
> > so that the application could sensibly distinguish between this case
> > (low quality entropy is in the buffer) and the "kernel is too dumb" case
> > (and no entropy is in the buffer).
>
> I'm not convinced we want applications to see that difference.
>
> The fact is, every time an application thinks it cares, it has caused
> problems. I can just see systemd saying "ok, the kernel didn't block,
> so I'll just do
>
>    while (getrandom(x) == -ENOENTROPY)
>        sleep(1);
>
> instead. Which is still completely buggy garbage.
>
> The fact is, we can't guarantee entropy in general. It's probably
> there is practice, particularly with user space saving randomness from
> last boot etc, but that kind of data may be real entropy, but the
> kernel cannot *guarantee* that it is.

I am not expecting the kernel to guarantee entropy. I just expecting
the kernel to not give me garbage knowingly. It's OK if it gives me
garbage unknowingly, but I have a problem if it gives me trash all the
time.

There's benefit in being able to wait until the pool is initialized
before we update the random seed stored on disk with a new one, and
there's benefit in being able to wait until the pool is initialized
before we let cryptsetup read a fresh, one-time key for dm-crypt from
/dev/urandom. I fully understand that any such reporting for
initialization is "best-effort", i.e. to the point where we don't know
anything to the contrary, but at least give userspace that.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  6:56                               ` Lennart Poettering
@ 2019-09-15  7:01                                 ` Willy Tarreau
  2019-09-15  7:05                                   ` Lennart Poettering
  2019-09-15 17:02                                 ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15  7:01 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Linus Torvalds, Alexander E. Patrakov, Ahmed S. Darwish,
	Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> There's benefit in being able to wait until the pool is initialized
> before we update the random seed stored on disk with a new one,

And what exactly makes you think that waiting with arms crossed not
doing anything else has any chance to make the situation change if
you already had no such entropy available when reaching that first
call, especially during early boot ?

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  7:01                                 ` Willy Tarreau
@ 2019-09-15  7:05                                   ` Lennart Poettering
  2019-09-15  7:07                                     ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  7:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Alexander E. Patrakov, Ahmed S. Darwish,
	Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On So, 15.09.19 09:01, Willy Tarreau (w@1wt.eu) wrote:

> On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> > There's benefit in being able to wait until the pool is initialized
> > before we update the random seed stored on disk with a new one,
>
> And what exactly makes you think that waiting with arms crossed not
> doing anything else has any chance to make the situation change if
> you already had no such entropy available when reaching that first
> call, especially during early boot ?

That code can finish 5h after boot, it's entirely fine with this
specific usecase.

Again: we don't delay "the boot" for this. We just delay "writing a
new seed to disk" for this. And if that is 5h later, then that's
totally fine, because in the meantime it's just one bg process more that
hangs around waiting to do what it needs to do.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  7:05                                   ` Lennart Poettering
@ 2019-09-15  7:07                                     ` Willy Tarreau
  2019-09-15  8:34                                       ` Lennart Poettering
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15  7:07 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Linus Torvalds, Alexander E. Patrakov, Ahmed S. Darwish,
	Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 09:05:41AM +0200, Lennart Poettering wrote:
> On So, 15.09.19 09:01, Willy Tarreau (w@1wt.eu) wrote:
> 
> > On Sun, Sep 15, 2019 at 08:56:55AM +0200, Lennart Poettering wrote:
> > > There's benefit in being able to wait until the pool is initialized
> > > before we update the random seed stored on disk with a new one,
> >
> > And what exactly makes you think that waiting with arms crossed not
> > doing anything else has any chance to make the situation change if
> > you already had no such entropy available when reaching that first
> > call, especially during early boot ?
> 
> That code can finish 5h after boot, it's entirely fine with this
> specific usecase.
> 
> Again: we don't delay "the boot" for this. We just delay "writing a
> new seed to disk" for this. And if that is 5h later, then that's
> totally fine, because in the meantime it's just one bg process more that
> hangs around waiting to do what it needs to do.

Didn't you say it could also happen when using encrypted swap ? If so
I suspect this could happen very early during boot, before any services
may be started ?

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  6:51                           ` Lennart Poettering
@ 2019-09-15  7:27                             ` Ahmed S. Darwish
  2019-09-15  8:48                               ` Lennart Poettering
  2019-09-15 16:29                             ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-15  7:27 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Linus Torvalds, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Sun, Sep 15, 2019 at 08:51:42AM +0200, Lennart Poettering wrote:
> On Sa, 14.09.19 09:30, Linus Torvalds (torvalds@linux-foundation.org) wrote:
[...]
> 
> And please don't break /dev/urandom again. The above code is the ony
> way I see how we can make /dev/urandom-derived swap encryption safe,
> and the only way I can see how we can sanely write a valid random seed
> to disk after boot.
>

Any hope in making systemd-random-seed(8) credit that "random seed
from previous boot" file, through RNDADDENTROPY, *by default*?

Because of course this makes the problem reliably go away on my system
too (as discussed in the original bug report, but you were not CCed).

I know that by v243, just released 12 days ago, this can be optionally
done through SYSTEMD_RANDOM_SEED_CREDIT=1. I wonder though if it can
ever be done by default, just like what the BSDs does... This would
solve a big part of the current problem.

> Lennart

thanks,

-- 
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15  5:22                           ` [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized Theodore Y. Ts'o
@ 2019-09-15  8:17                             ` Ahmed S. Darwish
  2019-09-15  8:59                               ` Lennart Poettering
  2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-15  8:17 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Alexander E. Patrakov, Michael Kerrisk,
	Lennart Poettering, Willy Tarreau, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

Since Linux v3.17, getrandom() has been created as a new and more
secure interface for pseudorandom data requests. It attempted to solve
three problems as compared to /dev/urandom:

  1. the need to access filesystem paths, which can fail, e.g. under a
     chroot

  2. the need to open a file descriptor, which can fail under file
     descriptor exhaustion attacks

  3. the possibility to get not-so-random data from /dev/urandom, due to
     an incompletely initialized kernel entropy pool

To solve the third problem, getrandom(2) was made to block until a
proper amount of entropy has been accumulated. This basically made the
system call have no guaranteed upper-bound for its waiting time.

As was said in c6e9d6f38894 (random: introduce getrandom(2) system
call): "Any userspace program which uses this new functionality must
take care to assure that if it is used during the boot process, that it
will not cause the init scripts or other portions of the system startup
to hang indefinitely."

Meanwhile, user-facing Linux documentation, e.g. the urandom(4) and
getrandom(2) manpages, didn't add such explicit warnings. It didn't
also help that glibc, since v2.25, implemented an "OpenBSD-like"
getentropy(3) in terms of getrandom(2).  OpenBSD getentropy(2) never
blocked though, while linux-glibc version did, possibly indefinitely.
Since that glibc change, even more applications at the boot-path began
to implicitly reques randomness through getrandom(2); e.g., for an
Xorg/Xwayland MIT cookie.

OpenBSD genentropy(2) never blocked because, as stated in its rnd(4)
manpages, it saves entropy to disk on shutdown and restores it on boot.
Moreover, the NetBSD bootloader, as shown in its boot(8), even have
special commands to load a random seed file and pass it to the kernel.
Meanwhile on a Linux systemd userland, systemd-random-seed(8) preserved
a random seed across reboots at /var/lib/systemd/random-seed, but it
never had the actual code, until very recently at v243, to ask the
kernel to credit such entropy through an RNDADDENTROPY ioctl.

From a mix of the above factors, it began to be common for Embedded
Linux systems to "get stuck at boot" unless a daemon like haveged is
installed, or the BSP provider enabling the necessary hwrng driver in
question and crediting its entropy; e.g. 62f95ae805fa (hwrng: omap - Set
default quality). Over time, the issue began to even creep into
consumer-level x86 laptops: mainstream distributions, like debian
buster, began to recommend installing haveged as a workaround.

Thus, on certain setups where there is no hwrng (embedded systems or VMs
on a host lacking virtio-rng), or the hwrng is not trusted by some users
(intel RDRAND), or sometimes it's just broken (amd RDRAND), the system
boot can be *reliably* blocked.

It can therefore be argued that there is no way to use getrandom() on
Linux correctly, especially from shared libraries: GRND_NONBLOCK has
to be used, and a fallback to some other interface like /dev/urandom
is required, thus making the net result no better than just using
/dev/urandom unconditionally.

The issue is further exaggerated by recent file-system optimizations,
e.g. b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), which merges
directory lookup code inode table IO, and thus minimizes the number of
disk interrupts and entropy during boot. After that commit, a blocked
boot can be reliably reproduced on a Thinkpad E480 laptop with
standard ArchLinux user-space.

Thus, don't trust user-space on calling getrandom(2) from the right
context. Never block, by default, and just return data from the
urandom source if entropy is not yet available. This is an explicit
decision not to let user-space work around this through busy loops on
error-codes.

Note: this lowers the quality of random data returned by getrandom(2)
to the level of randomness returned by /dev/urandom, with all the
original security implications coming out of that, as discussed in
problem "3." at the top of this commit log. If this is not desirable,
offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or
random.getrandom_block=true bootparam.

[tytso@mit.edu: make the change to a non-blocking getrandom(2) optional]
Link: https://lkml.kernel.org/r/20190914222432.GC19710@mit.edu
Link: https://lkml.kernel.org/r/20190911173624.GI2740@mit.edu
Link: https://factorable.net ("Widespread Weak Keys in Network Devices")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Rreported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---

Notes:
    changelog-v2:
      - tytso: make blocking optional
    
    changelog-v3:
      - more detailed commit log + historical context (thanks patrakov)
      - remove WARN_ON_ONCE. It's pretty excessive, and the first caller
        is systemd-random-seed(8), which we know it will not change.
        Just print errors in the kernel log.
    
    $dmesg | grep random:
    
      [0.235843] random: get_random_bytes called from start_kernel+0x30f/0x4d7 with crng_init=0
      [0.685682] random: fast init done
      [2.405263] random: lvm: CRNG uninitialized (4 bytes read)
      [2.480686] random: systemd-random-: getrandom (512 bytes): CRNG not yet initialized
      [2.480687] random: systemd-random-: CRNG uninitialized (512 bytes read)
      [3.265201] random: dbus-daemon: CRNG uninitialized (12 bytes read)
      [3.835066] urandom_read: 1 callbacks suppressed
      [3.835068] random: polkitd: CRNG uninitialized (8 bytes read)
      [3.835509] random: polkitd: CRNG uninitialized (8 bytes read)
      [3.835577] random: polkitd: CRNG uninitialized (8 bytes read)
      [4.190653] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
      [4.190658] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
      [4.190662] random: gnome-session-b: getrandom (16 bytes): CRNG not yet initialized
      [4.952299] random: crng init done
      [4.952311] random: 3 urandom warning(s) missed due to ratelimiting
      [4.952314] random: 1 getrandom warning(s) missed due to ratelimiting

 drivers/char/Kconfig  | 33 +++++++++++++++++++++++++++++++--
 drivers/char/random.c | 33 ++++++++++++++++++++++++++++-----
 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3e866885a405..337baeca5ebc 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -557,8 +557,6 @@ config ADI
 	  and SSM (Silicon Secured Memory).  Intended consumers of this
 	  driver include crash and makedumpfile.
 
-endmenu
-
 config RANDOM_TRUST_CPU
 	bool "Trust the CPU manufacturer to initialize Linux's CRNG"
 	depends on X86 || S390 || PPC
@@ -573,3 +571,34 @@ config RANDOM_TRUST_CPU
 	has not installed a hidden back door to compromise the CPU's
 	random number generation facilities. This can also be configured
 	at boot with "random.trust_cpu=on/off".
+
+config RANDOM_BLOCK
+	bool "Block if getrandom is called before CRNG is initialized"
+	help
+	  Say Y here if you want userspace programs which call
+	  getrandom(2) before the Cryptographic Random Number
+	  Generator (CRNG) is initialized to block until
+	  secure random numbers are available.
+
+	  Say N if you believe usability is more important than
+	  security, so if getrandom(2) is called before the CRNG is
+	  initialized, it should not block, but instead return "best
+	  effort" randomness which might not be very secure or random
+	  at all; but at least the system boot will not be delayed by
+	  minutes or hours.
+
+	  This can also be controlled at boot with
+	  "random.getrandom_block=on/off".
+
+	  Ideally, systems would be configured with hardware random
+	  number generators, and/or configured to trust CPU-provided
+	  RNG's.  In addition, userspace should generate cryptographic
+	  keys only as late as possible, when they are needed, instead
+	  of during early boot.  (For non-cryptographic use cases,
+	  such as dictionary seeds or MIT Magic Cookies, other
+	  mechanisms such as /dev/urandom or random(3) may be more
+	  appropropriate.)  This config option controls what the
+	  kernel should do as a fallback when the non-ideal case
+	  presents itself.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 4a50ee2c230d..689fdb486785 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -511,6 +511,8 @@ static struct ratelimit_state unseeded_warning =
 	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
 static struct ratelimit_state urandom_warning =
 	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
+static struct ratelimit_state getrandom_warning =
+	RATELIMIT_STATE_INIT("warn_getrandom_randomness", HZ, 3);
 
 static int ratelimit_disable __read_mostly;
 
@@ -854,12 +856,19 @@ static void invalidate_batched_entropy(void);
 static void numa_crng_init(void);
 
 static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static bool getrandom_block __ro_after_init = IS_ENABLED(CONFIG_RANDOM_BLOCK);
 static int __init parse_trust_cpu(char *arg)
 {
 	return kstrtobool(arg, &trust_cpu);
 }
 early_param("random.trust_cpu", parse_trust_cpu);
 
+static int __init parse_block(char *arg)
+{
+	return kstrtobool(arg, &getrandom_block);
+}
+early_param("random.getrandom_block", parse_block);
+
 static void crng_initialize(struct crng_state *crng)
 {
 	int		i;
@@ -1053,6 +1062,12 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
 				  urandom_warning.missed);
 			urandom_warning.missed = 0;
 		}
+		if (getrandom_warning.missed) {
+			pr_notice("random: %d getrandom warning(s) missed "
+				  "due to ratelimiting\n",
+				  getrandom_warning.missed);
+			getrandom_warning.missed = 0;
+		}
 	}
 }
 
@@ -1915,6 +1930,7 @@ int __init rand_initialize(void)
 	crng_global_init_time = jiffies;
 	if (ratelimit_disable) {
 		urandom_warning.interval = 0;
+		getrandom_warning.interval = 0;
 		unseeded_warning.interval = 0;
 	}
 	return 0;
@@ -1984,8 +2000,8 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	if (!crng_ready() && maxwarn > 0) {
 		maxwarn--;
 		if (__ratelimit(&urandom_warning))
-			printk(KERN_NOTICE "random: %s: uninitialized "
-			       "urandom read (%zd bytes read)\n",
+			pr_err("random: %s: CRNG uninitialized "
+			       "(%zd bytes read)\n",
 			       current->comm, nbytes);
 		spin_lock_irqsave(&primary_crng.lock, flags);
 		crng_init_cnt = 0;
@@ -2152,9 +2168,16 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
+
+		if (__ratelimit(&getrandom_warning))
+			pr_err("random: %s: getrandom (%zd bytes): CRNG not "
+			       "yet initialized", current->comm, count);
+
+		if (getrandom_block) {
+			ret = wait_for_random_bytes();
+			if (unlikely(ret))
+				return ret;
+		}
 	}
 	return urandom_read(NULL, buf, count, NULL);
 }
-- 
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  7:07                                     ` Willy Tarreau
@ 2019-09-15  8:34                                       ` Lennart Poettering
  0 siblings, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  8:34 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Alexander E. Patrakov, Ahmed S. Darwish,
	Theodore Y. Ts'o, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On So, 15.09.19 09:07, Willy Tarreau (w@1wt.eu) wrote:

> > That code can finish 5h after boot, it's entirely fine with this
> > specific usecase.
> >
> > Again: we don't delay "the boot" for this. We just delay "writing a
> > new seed to disk" for this. And if that is 5h later, then that's
> > totally fine, because in the meantime it's just one bg process more that
> > hangs around waiting to do what it needs to do.
>
> Didn't you say it could also happen when using encrypted swap ? If so
> I suspect this could happen very early during boot, before any services
> may be started ?

Depends on the deps, and what options are used in /etc/crypttab. If
people hard rely on swap to be enabled for boot to proceed and also
use one-time passwords from /dev/urandom they better provide some form
of hw rng, too. Otherwise the boot will block, yes.

Basically, just add "nofail" to a line in /etc/crypttab, and the entry
will be activated at boot, but we won't delay boot for it. It's going
to be activated as soon as the deps are fulfilled (and thus the pool
initialized), but that may well be 5h after boot, and that's totally
OK as long as nothing else hard depends on it.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  7:27                             ` Ahmed S. Darwish
@ 2019-09-15  8:48                               ` Lennart Poettering
  0 siblings, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  8:48 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On So, 15.09.19 09:27, Ahmed S. Darwish (darwish.07@gmail.com) wrote:

> On Sun, Sep 15, 2019 at 08:51:42AM +0200, Lennart Poettering wrote:
> > On Sa, 14.09.19 09:30, Linus Torvalds (torvalds@linux-foundation.org) wrote:
> [...]
> >
> > And please don't break /dev/urandom again. The above code is the ony
> > way I see how we can make /dev/urandom-derived swap encryption safe,
> > and the only way I can see how we can sanely write a valid random seed
> > to disk after boot.
> >
>
> Any hope in making systemd-random-seed(8) credit that "random seed
> from previous boot" file, through RNDADDENTROPY, *by default*?

No. For two reasons:

a) It's way too late. We shouldn't credit entropy from the disk seed
   if we cannot update the disk seed with a new one at the same time,
   otherwise we might end up crediting the same seed twice on
   subsequent reboots (think: user hard powers off a system after we
   credited but before we updated), in which case there would not be a
   point in doing that at all. Hence, we have to wait until /var is
   writable, but that's relatively late during boot. Long afer the
   initrd ran, long after iscsi and what not ran. Long after the
   network stack is up and so on. In a time where people load root
   images from the initrd via HTTPS thats's generally too late to be
   useful.

b) Golden images are a problem. There are probably more systems
   running off golden images in the wild, than those not running off
   them. This means: a random seed on disk is only safe to credit if
   it gets purged when the image is distributed to the systems it's
   supposed to be used on, because otherwise these systems will all
   come up with the very same seed, which makes it useless. So, by
   requesting people to explicitly acknowledge that they are aware of
   this problem (and either don't use golden images, or safely wipe
   the seed off the image before shipping it), by setting the env var,
   we protect ourselves from this.

Last time I looked at it most popular distro's live images didn't wipe
the random seed properly before distributing it to users...

This is all documented btw:

https://systemd.io/RANDOM_SEEDS#systemds-support-for-filling-the-kernel-entropy-pool

See point #2.

> I know that by v243, just released 12 days ago, this can be optionally
> done through SYSTEMD_RANDOM_SEED_CREDIT=1. I wonder though if it can
> ever be done by default, just like what the BSDs does... This would
> solve a big part of the current problem.

I think the best approach would be to do this in the boot loader. In
fact systemd does this in its own boot loader (sd-boot): it reads a
seed off the ESP, updates it (via a SHA256 hashed from the old one)
and passes that to the OS. PID 1 very early on then credits this to
the kernel's pool (ideally the kernel would just do this on its own
btw). The trick we employ to make this generally safe is that we
persistently store a "system token" as EFI var too, and include it in
the SHA sum. The "system token" is a per-system random blob. It is
created the first time it's needed and a good random source exists,
and then stays on the system, for all future live images to use. This
makes sure that even if sloppily put together live images are used
(which do not reset any random seed) every system will use a different
series of RNG seeds.

This then solves both problems: the golden image problem, and the
early-on problem. But of course only on ESP. Other systems should be
able to provide similar mechanisms though, it's not rocket science.

This is also documented here:

https://systemd.io/RANDOM_SEEDS#systemds-support-for-filling-the-kernel-entropy-pool

See point #3...

Ideally other boot loaders (grub, …) would support the same scheme,
but I am not sure the problem set is known to them.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15  8:17                             ` [PATCH RFC v3] random: getrandom(2): optionally block when " Ahmed S. Darwish
@ 2019-09-15  8:59                               ` Lennart Poettering
  2019-09-15  9:30                                 ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-15  8:59 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Linus Torvalds, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On So, 15.09.19 10:17, Ahmed S. Darwish (darwish.07@gmail.com) wrote:

> Thus, don't trust user-space on calling getrandom(2) from the right
> context. Never block, by default, and just return data from the
> urandom source if entropy is not yet available. This is an explicit
> decision not to let user-space work around this through busy loops on
> error-codes.
>
> Note: this lowers the quality of random data returned by getrandom(2)
> to the level of randomness returned by /dev/urandom, with all the
> original security implications coming out of that, as discussed in
> problem "3." at the top of this commit log. If this is not desirable,
> offer users a fallback to old behavior, by CONFIG_RANDOM_BLOCK=y, or
> random.getrandom_block=true bootparam.

This is an awful idea. It just means that all crypto that needs
entropy doing during early boot will now be using weak keys, and
doesn't even know it.

Yeah, it's a bad situation, but I am very sure that failing loudly in
this case is better than just sticking your head in the sand and
ignoring the issue without letting userspace know is an exceptionally
bad idea.

We live in a world where people run HTTPS, SSH, and all that stuff in
the initrd already. It's where SSH host keys are generated, and plenty
session keys. If Linux lets all that stuff run with awful entropy then
you pretend things where secure while they actually aren't. It's much
better to fail loudly in that case, I am sure.

Quite frankly, I don't think this is something to fix in the
kernel. Let the people putting together systems deal with this. Let
them provide a creditable hw rng, and let them pay the price if they
don't.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15  8:59                               ` Lennart Poettering
@ 2019-09-15  9:30                                 ` Willy Tarreau
  2019-09-15 10:02                                   ` Ahmed S. Darwish
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15  9:30 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Linus Torvalds,
	Alexander E. Patrakov, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> We live in a world where people run HTTPS, SSH, and all that stuff in
> the initrd already. It's where SSH host keys are generated, and plenty
> session keys.

It is exactly the type of crap that create this situation : making
people developing such scripts believe that any random source was OK
to generate these, and as such forcing urandom to produce crypto-solid
randoms! No, distro developers must know that it's not acceptable to
generate lifetime crypto keys from the early boot when no entropy is
available. At least with this change they will get an error returned
from getrandom() and will be able to ask the user to feed entropy, or
be able to say "it was impossible to generate the SSH key right now,
the daemon will only be started once it's possible", or "the SSH key
we produced will not be saved because it's not safe and is only usable
for this recovery session".

> If Linux lets all that stuff run with awful entropy then
> you pretend things where secure while they actually aren't. It's much
> better to fail loudly in that case, I am sure.

This is precisely what this change permits : fail instead of block
by default, and let applications decide based on the use case.

> Quite frankly, I don't think this is something to fix in the
> kernel.

As long as it offers a single API to return randoms, and that it is
not possible not to block for low-quality randoms, it needs to be
at least addressed there. Then userspace can adapt. For now userspace
does not have this option just due to the kernel's way of exposing
randoms.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15  9:30                                 ` Willy Tarreau
@ 2019-09-15 10:02                                   ` Ahmed S. Darwish
  2019-09-15 10:40                                     ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-15 10:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Lennart Poettering, Theodore Y. Ts'o, Linus Torvalds,
	Alexander E. Patrakov, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> > We live in a world where people run HTTPS, SSH, and all that stuff in
> > the initrd already. It's where SSH host keys are generated, and plenty
> > session keys.
> 
> It is exactly the type of crap that create this situation : making
> people developing such scripts believe that any random source was OK
> to generate these, and as such forcing urandom to produce crypto-solid
> randoms!

Willy, let's tone it down please... the thread is already getting a
bit toxic.

> No, distro developers must know that it's not acceptable to
> generate lifetime crypto keys from the early boot when no entropy is
> available. At least with this change they will get an error returned
> from getrandom() and will be able to ask the user to feed entropy, or
> be able to say "it was impossible to generate the SSH key right now,
> the daemon will only be started once it's possible", or "the SSH key
> we produced will not be saved because it's not safe and is only usable
> for this recovery session".
> 
> > If Linux lets all that stuff run with awful entropy then
> > you pretend things where secure while they actually aren't. It's much
> > better to fail loudly in that case, I am sure.
> 
> This is precisely what this change permits : fail instead of block
> by default, and let applications decide based on the use case.
>

Unfortunately, not exactly.

Linus didn't want getrandom to return an error code / "to fail" in
that case, but to silently return CRNG-uninitialized /dev/urandom
data, to avoid user-space even working around the error code through
busy-loops.

I understand the rationale behind that, of course, and this is what
I've done so far in the V3 RFC.

Nonetheless, this _will_, for example, make systemd-random-seed(8)
save week seeds under /var/lib/systemd/random-seed, since the kernel
didn't inform it about such weakness at all..

The situation is so bad now, that it's more of "some user-space are
more equal than others".. Let's just at least admit this while
discussing the RFC patch in question.

thanks,

> > Quite frankly, I don't think this is something to fix in the
> > kernel.
> 
> As long as it offers a single API to return randoms, and that it is
> not possible not to block for low-quality randoms, it needs to be
> at least addressed there. Then userspace can adapt. For now userspace
> does not have this option just due to the kernel's way of exposing
> randoms.
> 
> Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15 10:02                                   ` Ahmed S. Darwish
@ 2019-09-15 10:40                                     ` Willy Tarreau
  2019-09-15 10:55                                       ` Ahmed S. Darwish
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 10:40 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Lennart Poettering, Theodore Y. Ts'o, Linus Torvalds,
	Alexander E. Patrakov, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> > > We live in a world where people run HTTPS, SSH, and all that stuff in
> > > the initrd already. It's where SSH host keys are generated, and plenty
> > > session keys.
> > 
> > It is exactly the type of crap that create this situation : making
> > people developing such scripts believe that any random source was OK
> > to generate these, and as such forcing urandom to produce crypto-solid
> > randoms!
> 
> Willy, let's tone it down please... the thread is already getting a
> bit toxic.

I don't see what's wrong in my tone above, I'm sorry if it can be
perceived as such. My point was that things such as creating lifetime
keys while there's no entropy is the wrong thing to do and what
progressively led to this situation.

> > > If Linux lets all that stuff run with awful entropy then
> > > you pretend things where secure while they actually aren't. It's much
> > > better to fail loudly in that case, I am sure.
> > 
> > This is precisely what this change permits : fail instead of block
> > by default, and let applications decide based on the use case.
> >
> 
> Unfortunately, not exactly.
> 
> Linus didn't want getrandom to return an error code / "to fail" in
> that case, but to silently return CRNG-uninitialized /dev/urandom
> data, to avoid user-space even working around the error code through
> busy-loops.

But with this EINVAL you have the information that it only filled
the buffer with whatever it could, right ? At least that was the
last point I manage to catch in the discussion. Otherwise if it's
totally silent, I fear that it will reintroduce the problem in a
different form (i.e. libc will say "our randoms are not reliable
anymore, let us work around this and produce blocking, solid randoms
again to help all our users").

> I understand the rationale behind that, of course, and this is what
> I've done so far in the V3 RFC.
> 
> Nonetheless, this _will_, for example, make systemd-random-seed(8)
> save week seeds under /var/lib/systemd/random-seed, since the kernel
> didn't inform it about such weakness at all..

Then I am confused because I understood that the goal was to return
EINVAL or anything equivalent in which case the userspace knows what
it has to deal with :-/

Regards,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15 10:40                                     ` Willy Tarreau
@ 2019-09-15 10:55                                       ` Ahmed S. Darwish
  2019-09-15 11:17                                         ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-15 10:55 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Lennart Poettering, Theodore Y. Ts'o, Linus Torvalds,
	Alexander E. Patrakov, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote:
> On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
[...]
> > > > If Linux lets all that stuff run with awful entropy then
> > > > you pretend things where secure while they actually aren't. It's much
> > > > better to fail loudly in that case, I am sure.
> > > 
> > > This is precisely what this change permits : fail instead of block
> > > by default, and let applications decide based on the use case.
> > >
> > 
> > Unfortunately, not exactly.
> > 
> > Linus didn't want getrandom to return an error code / "to fail" in
> > that case, but to silently return CRNG-uninitialized /dev/urandom
> > data, to avoid user-space even working around the error code through
> > busy-loops.
> 
> But with this EINVAL you have the information that it only filled
> the buffer with whatever it could, right ? At least that was the
> last point I manage to catch in the discussion. Otherwise if it's
> totally silent, I fear that it will reintroduce the problem in a
> different form (i.e. libc will say "our randoms are not reliable
> anymore, let us work around this and produce blocking, solid randoms
> again to help all our users").
>

V1 of the patch I posted did indeed return -EINVAL. Linus then
suggested that this might make still some user-space act smart and
just busy-loop around that, basically blocking the boot again:

    https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com
    https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com

So it was then requested to actually return what /dev/urandom would
return, so that user-space has no way whatsoever in knowing if
getrandom has failed. Then, it's the job of system integratos / BSP
builders to fix the inspect the big fat WARN on the kernel and fix
that.

This is the core of Lennart's critqueue of V3 above.

> > I understand the rationale behind that, of course, and this is what
> > I've done so far in the V3 RFC.
> > 
> > Nonetheless, this _will_, for example, make systemd-random-seed(8)
> > save week seeds under /var/lib/systemd/random-seed, since the kernel
> > didn't inform it about such weakness at all..
> 
> Then I am confused because I understood that the goal was to return
> EINVAL or anything equivalent in which case the userspace knows what
> it has to deal with :-/
>

Yeah, the discussion moved a bit beyond that.

thanks,
--darwi

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v3] random: getrandom(2): optionally block when CRNG is uninitialized
  2019-09-15 10:55                                       ` Ahmed S. Darwish
@ 2019-09-15 11:17                                         ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 11:17 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Lennart Poettering, Theodore Y. Ts'o, Linus Torvalds,
	Alexander E. Patrakov, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 12:55:39PM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 12:40:27PM +0200, Willy Tarreau wrote:
> > On Sun, Sep 15, 2019 at 12:02:01PM +0200, Ahmed S. Darwish wrote:
> > > On Sun, Sep 15, 2019 at 11:30:57AM +0200, Willy Tarreau wrote:
> > > > On Sun, Sep 15, 2019 at 10:59:07AM +0200, Lennart Poettering wrote:
> [...]
> > > > > If Linux lets all that stuff run with awful entropy then
> > > > > you pretend things where secure while they actually aren't. It's much
> > > > > better to fail loudly in that case, I am sure.
> > > > 
> > > > This is precisely what this change permits : fail instead of block
> > > > by default, and let applications decide based on the use case.
> > > >
> > > 
> > > Unfortunately, not exactly.
> > > 
> > > Linus didn't want getrandom to return an error code / "to fail" in
> > > that case, but to silently return CRNG-uninitialized /dev/urandom
> > > data, to avoid user-space even working around the error code through
> > > busy-loops.
> > 
> > But with this EINVAL you have the information that it only filled
> > the buffer with whatever it could, right ? At least that was the
> > last point I manage to catch in the discussion. Otherwise if it's
> > totally silent, I fear that it will reintroduce the problem in a
> > different form (i.e. libc will say "our randoms are not reliable
> > anymore, let us work around this and produce blocking, solid randoms
> > again to help all our users").
> >
> 
> V1 of the patch I posted did indeed return -EINVAL. Linus then
> suggested that this might make still some user-space act smart and
> just busy-loop around that, basically blocking the boot again:
> 
>     https://lkml.kernel.org/r/CAHk-=wiB0e_uGpidYHf+dV4eeT+XmG-+rQBx=JJ110R48QFFWw@mail.gmail.com
>     https://lkml.kernel.org/r/CAHk-=whSbo=dBiqozLoa6TFmMgbeB8d9krXXvXBKtpRWkG0rMQ@mail.gmail.com
> 
> So it was then requested to actually return what /dev/urandom would
> return, so that user-space has no way whatsoever in knowing if
> getrandom has failed. Then, it's the job of system integratos / BSP
> builders to fix the inspect the big fat WARN on the kernel and fix
> that.

Then I was indeed a bit confused in the middle of the discussion as
I didn't understand exactly this, thanks for the clarifying :-)

But does it still block when called with GRND_RANDOM ? If so I guess
I'm fine as it translates exactly the previous behavior of random vs
urandom, and that GRND_NONBLOCK allows the application to fall back
to reliable sources if needed (typically human interactions).

Thanks,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  6:51                           ` Lennart Poettering
  2019-09-15  7:27                             ` Ahmed S. Darwish
@ 2019-09-15 16:29                             ` Linus Torvalds
  2019-09-16  1:40                               ` Ahmed S. Darwish
  1 sibling, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 16:29 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
<mzxreary@0pointer.de> wrote:
>
> Oh man. Just spend 5min to understand the situation, before claiming
> this was garbage or that was garbage. The code above does not block
> boot.

Yes it does. You clearly didn't read the thread.

> It blocks startup of services that explicit order themselves
> after the code above. There's only a few services that should do that,
> and the main system boots up just fine without waiting for this.

That's a nice theory, but it doesn't actually match reality.

There are clearly broken setups that use this for things that it
really shouldn't be used for. Asking for true randomness at boot
before there is any indication that randomness exists, and then just
blocking with no further action that could actually _generate_ said
randomness.

If your description was true that the system would come up and be
usable while the blocked thread is waiting for that to happen, things
would be fine.

But that simply isn't the case.

                  Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15  6:56                               ` Lennart Poettering
  2019-09-15  7:01                                 ` Willy Tarreau
@ 2019-09-15 17:02                                 ` Linus Torvalds
  2019-09-16  3:23                                   ` Theodore Y. Ts'o
  1 sibling, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 17:02 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Alexander E. Patrakov, Ahmed S. Darwish, Theodore Y. Ts'o,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sat, Sep 14, 2019 at 11:56 PM Lennart Poettering
<mzxreary@0pointer.de> wrote:
>
> I am not expecting the kernel to guarantee entropy. I just expecting
> the kernel to not give me garbage knowingly. It's OK if it gives me
> garbage unknowingly, but I have a problem if it gives me trash all the
> time.

So realistically, we never actually give you *garbage*.

It's just that we try very hard to actually give you some entropy
guarantees, and that we can't always do in a timely manner -
particularly if you don't help.

But on a PC, we can _almost_ guarantee entropy. Even with a golden
image, we do mix in:

 - timestamp counter on every device interrupt (but "device interrupt"
doesn't include things like the local CPU timer, so it really needs
device activity)

 - random boot and BIOS memory (dmi tables, the EFI RNG entry, etc)

 - various device state (things like MAC addresses when registering
network devices, USB device numbers, etc)

 - and obviously any CPU rdrand data

and note the "mix in" part - it's all designed so that you don't trust
any of this for randomness on its own, but very much hopefully it
means that almost *any* differences in boot environment will add a
fair amount of unpredictable behavior.

But also note the "on a PC" part.

Also note that as far as the kernel is concerned, none of the above
counts as "entropy" for us, except to a very small degree the device
interrupt timing thing. But you need hundreds of interrupts for that
to be considered really sufficient.

And that's why things broke. It turns out that making ext4 be more
efficient at boot caused fewer disk interrupts, and now we weren't
convinced we had sufficient entropy. And the systemd boot thing just
*stopped* waiting for entropy to magically appear, which is never will
if the machine is idle and not doing anything.

So do we give you "garbage" in getrandom()? We try really really hard
not to, but it's exactly the "can we _guarantee_ that it has entropy"
that ends up being the problem.

So if some silly early boot process comes along, and asks for "true
randomness", and just blocks for it without doing anything else,
that's broken from a kernel perspective.

In practice, the only situation we have had really big problems with
not giving "garbage" isn't actually the "golden distro image" case you
talk about. It's the "embedded device golden _system_ image" case,
where the image isn't just the distribution, but the full bootloader
state.

Some cheap embedded MIPS CPU without even a timestamp counter, with
identical flash contents for millions of devices, and doing a "on
first boot, generate a long-term key" without even connecting to the
network first.

That's the thing Ted was pointing at:

    https://factorable.net/weakkeys12.extended.pdf

so yes, it can be "garbage", but it can be garbage only if you really
really do things entirely wrong.

But basically, you should never *ever* try to generate some long-lived
key and then just wait for it without doing anything else. The
"without doing anything else" is key here.

But every time we've had a blocking interface, that's exactly what
somebody has done. Which is why I consider that long blocking thing to
be completely unacceptable. There is no reason to believe that the
wait will ever end, partly exactly because we don't consider timer
interrupts to add any timer randomness. So if you are just waiting,
nothing necessarily ever happen.

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15  5:22                           ` [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized Theodore Y. Ts'o
  2019-09-15  8:17                             ` [PATCH RFC v3] random: getrandom(2): optionally block when " Ahmed S. Darwish
@ 2019-09-15 17:32                             ` Linus Torvalds
  2019-09-15 18:32                               ` Willy Tarreau
                                                 ` (2 more replies)
  1 sibling, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 17:32 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Alexander E. Patrakov, Ahmed S. Darwish, Michael Kerrisk,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml, Lennart Poettering


[-- Attachment #1: Type: text/plain, Size: 4454 bytes --]

[ Added Lennart, who was active in the other thread ]

On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> Thus, add an optional configuration option which stops getrandom(2)
> from blocking, but instead returns "best efforts" randomness, which
> might not be random or secure at all.

So I hate having a config option for something like this.

How about this attached patch instead? It only changes the waiting
logic, and I'll quote the comment in full, because I think that
explains not only the rationale, it explains every part of the patch
(and is most of the patch anyway):

 * We refuse to wait very long for a blocking getrandom().
 *
 * The crng may not be ready during boot, but if you ask for
 * blocking random numbers very early, there is no guarantee
 * that you'll ever get any timely entropy.
 *
 * If you are sure you need entropy and that you can generate
 * it, you need to ask for non-blocking random state, and then
 * if that fails you must actively _do_something_ that causes
 * enough system activity, perhaps asking the user to type
 * something on the keyboard.
 *
 * Just asking for blocking random numbers is completely and
 * fundamentally wrong, and the kernel will not play that game.
 *
 * We will block for at most 15 seconds at a time, and if called
 * sequentially will decrease the blocking amount so that we'll
 * block for at most 30s total - and if people continue to ask
 * for blocking, at that point we'll just return whatever random
 * state we have acquired.
 *
 * This will also complain loudly if the timeout happens, to let
 * the distribution or system admin know about the problem.
 *
 * The process that gets the -EAGAIN will hopefully also log the
 * error, to raise awareness that there may be use of random
 * numbers without sufficient entropy.

Hmm? No strange behavior. No odd config variables. A bounded total
boot-time wait of 30s (which is a completely random number, but I
claimed it as the "big red button" time).

And if you only do it once and fall back to something else it will
only wait for 15s, and you'll have your error value so that you can
log it properly.

Yes, a single boot-time wait of 15s at boot is still "darn annoying",
but it likely

 (a) isn't so long that people consider it a boot failure and give up
(but hopefully annoying enough that they'll report it)

 (b) long enough that *if* the thing that is waiting is not actually
blocking the boot sequence, the non-blocked part of the boot sequence
should have time to do sufficient IO to get better randomness.

So (a) is the "the system is still usable" part. While (b) is the
"give it a chance, and even if it fails and you fall back on urandom
or whatever, you'll actually be getting good randomness even if we
can't perhaps _guarantee_ entropy".

Also, if you have some user that wants to do the old-timey ssh-keygen
thing with user input etc, we now have a documented way to do that:
just do the nonblocking thing, and then make really really sure that
you actually have something that generates more entropy if that
nonblocking thing returns EAGAIN. But it's also very clear that at
that point the program that wants this entropy guarantee has to _work_
for it.

Because just being lazy and say "block" without any entropy will
return EAGAIN for a (continually decreasing) while, but then at some
point stop and say "you're broken", and just give you the urandom
data.

Because if you really do nothing at all, and there is no activity
what-so-ever for 15s because you blocked the boot, then I claim that
it's better to return an error than to wait forever. And if you ignore
the error and just retry, eventually we'll do the fallback for you.

Of course, if you have something like rdrand, and told us you trust
it, none of this matters at all, since we'll have initialized the pool
long before.

So this is unconditional, but it's basically "unconditionally somewhat
flexibly reasonable". It should only ever trigger for the case where
the boot sequence was fundamentally broken. And it will complain
loudly (both at a kernel level, and hopefully at a systemd journal
level too) if it ever triggers.

And hey, if some distro wants to then revert this because they feel
uncomfortable with this, that's now _their_ problem, not the problem
of the upstream kernel. The upstream kernel tries to do something that
I think is arguably fairly reasonable in all situations.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/x-patch, Size: 2728 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
@ 2019-09-15 18:32                               ` Willy Tarreau
  2019-09-15 18:36                                 ` Willy Tarreau
  2019-09-15 18:59                                 ` Linus Torvalds
  2019-09-16 18:08                               ` Lennart Poettering
  2019-09-18 21:15                               ` [PATCH RFC v4 0/1] random: WARN on large getrandom() waits and introduce getrandom2() Ahmed S. Darwish
  2 siblings, 2 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 18:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 10:32:15AM -0700, Linus Torvalds wrote:
>  * We will block for at most 15 seconds at a time, and if called
>  * sequentially will decrease the blocking amount so that we'll
>  * block for at most 30s total - and if people continue to ask
>  * for blocking, at that point we'll just return whatever random
>  * state we have acquired.

I think that the exponential decay will either not be used or
be totally used, so in practice you'll always end up with 0 or
30s depending on the entropy situation, because I really do not
see any valid reason for entropy to suddenly start to appear
after 15s if it didn't prior to this. As such I do think that
a single timeout should be enough.

In addition, since you're leaving the door open to bikeshed around
the timeout valeue, I'd say that while 30s is usually not huge in a
desktop system's life, it actually is a lot in network environments
when it delays a switchover. It can cause other timeouts to occur
and leave quite a long embarrassing black out. I'd guess that a max
total wait time of 2-3s should be OK though since application timeouts
rarely are lower due to TCP generally starting to retransmit at 3s.
And even in 3s we're supposed to see quite some interrupts or it's
unlikely that much more will happen between 3 and 30s.

If the setting had to be made user-changeable then it could make
sense to let it be overridden on the kernel's command line though
I don't think that it should be necessary with a low enough value.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 18:32                               ` Willy Tarreau
@ 2019-09-15 18:36                                 ` Willy Tarreau
  2019-09-15 19:08                                   ` Linus Torvalds
  2019-09-15 18:59                                 ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

I also wanted to ask, are we going to enforce the same strategy on
/dev/urandom ? If we don't because we fear application breakage or
whatever, then there will always be some incentive against migrating
to getrandom(). And if we do it, we know we have to take a reasonable
approach making the change transparent enough for applications. That
would too go in favor of a short timeout.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 18:32                               ` Willy Tarreau
  2019-09-15 18:36                                 ` Willy Tarreau
@ 2019-09-15 18:59                                 ` Linus Torvalds
  2019-09-15 19:12                                   ` Willy Tarreau
  2019-09-16  2:45                                   ` Ahmed S. Darwish
  1 sibling, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 18:59 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <w@1wt.eu> wrote:
>
> I think that the exponential decay will either not be used or
> be totally used, so in practice you'll always end up with 0 or
> 30s depending on the entropy situation

According to the systemd random-seed source snippet that Ahmed posted,
it actually just tries once (well, first once non-blocking, then once
blocking) and then falls back to reading urandom if it fails.

So assuming there's just one of those "read much too early" cases, I
think it actually matters.

But while I tried to test this, on my F30 install, systemd seems to
always just use urandom().

I can trigger the urandom read warning easily enough (turn of CPU
rdrand trusting and increase the entropy requirement by a factor of
ten, and turn of the ioctl to add entropy from user space), just not
the getrandom() blocking case at all.

So presumably that's because I have a systemd that doesn't use
getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
Or maybe because Arch has some other oddity that just triggers the
problem.

> In addition, since you're leaving the door open to bikeshed around
> the timeout valeue, I'd say that while 30s is usually not huge in a
> desktop system's life, it actually is a lot in network environments
> when it delays a switchover.

Oh, absolutely.

But in that situation you have a MIS person on call, and somebody who
can fix it.

It's not like switchovers happen in a vacuum. What we should care
about is that updating a kernel _works_. No regressions. But if you
have some five-nines setup with switchover, you'd better have some
competent MIS people there too. You don't just switch kernels without
testing ;)

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 18:36                                 ` Willy Tarreau
@ 2019-09-15 19:08                                   ` Linus Torvalds
  2019-09-15 19:18                                     ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 19:08 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 11:37 AM Willy Tarreau <w@1wt.eu> wrote:
>
> I also wanted to ask, are we going to enforce the same strategy on
> /dev/urandom ?

Right now the strategy for /dev/urandom is "print a one-line warning,
then do the read".

I don't see why we should change that. The whole point of urandom has
been that it doesn't block, and doesn't use up entropy.

It's the _blocking_ behavior that has always been problematic. It's
why almost nobody uses /dev/random in practice.

getrandom() looks like /dev/urandom in not using up entropy, but had
that blocking behavior of /dev/random that was problematic.

And exactly the same way it was problematic for /dev/random users, it
has now shown itself to be problematic for getrandom().

My suggested patch left the /dev/random blocking behavior, because
hopefully people *know* about the problems there.

And hopefully people understand that getrandom(GRND_RANDOM) has all
the same issues.

If you want that behavior, you can still use GRND_RANDOM or
/dev/random, but they are simply not acceptable for boot-time
schenarios. Never have been,

... exactly the way the "block forever" wasn't acceptable for getrandom().

                Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 18:59                                 ` Linus Torvalds
@ 2019-09-15 19:12                                   ` Willy Tarreau
  2019-09-16  2:45                                   ` Ahmed S. Darwish
  1 sibling, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 19:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> > In addition, since you're leaving the door open to bikeshed around
> > the timeout valeue, I'd say that while 30s is usually not huge in a
> > desktop system's life, it actually is a lot in network environments
> > when it delays a switchover.
> 
> Oh, absolutely.
> 
> But in that situation you have a MIS person on call, and somebody who
> can fix it.
> 
> It's not like switchovers happen in a vacuum. What we should care
> about is that updating a kernel _works_. No regressions. But if you
> have some five-nines setup with switchover, you'd better have some
> competent MIS people there too. You don't just switch kernels without
> testing ;)

I mean maybe I didn't use the right term, but typically in networked
environments you'll have watchdogs on sensitive devices (e.g. the
default gateways and load balancers), which will trigger an instant
reboot of the system if something really bad happens. It can range
from a dirty oops, FS remounted R/O, pure freeze, OOM, missing
process, panic etc. And here the reset which used to take roughly
10s to get the whole services back up for operations suddenly takes
40s. My point is that I won't have issues explaining users that 10s
or 13s is the same when they rely on five nices, but trying to argue
that 40s is identical to 10s will be a hard position to stand by.

And actually there are other dirty cases. Such systems often work
in active-backup or active-active modes. One typical issue is that
the primary system reboots, the second takes over within one second,
and once the primary system is back *apparently* operating, some
processes which appear to be present and which possibly have already
bound their listening ports are waiting for 30s in getrandom() while
the monitoring systems around see them as ready, thus the primary
machine goes back to its role and cannot reliably run the service
for the first 30 seconds, which roughly multiplies the downtime by
30. That's why I'd like to make it possible to lower it this value
(either definitely or by cmdline, as I think it can be fine for
all those who care about down time).

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 19:08                                   ` Linus Torvalds
@ 2019-09-15 19:18                                     ` Willy Tarreau
  2019-09-15 19:31                                       ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 19:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 12:08:31PM -0700, Linus Torvalds wrote:
> My suggested patch left the /dev/random blocking behavior, because
> hopefully people *know* about the problems there.
> 
> And hopefully people understand that getrandom(GRND_RANDOM) has all
> the same issues.

I think this one doesn't cause any issue to users. It's the only
one that should be used for long-lived crypto keys in my opinion.

> If you want that behavior, you can still use GRND_RANDOM or
> /dev/random, but they are simply not acceptable for boot-time
> schenarios.

Oh no I definitely don't want this behavior at all for urandom, what
I'm saying is that as long as getrandom() will have a lower quality
of service than /dev/urandom for non-important randoms, there will be
compelling reasons to avoid it. And I think that your bounded wait
could actually reconciliate both ends of the users spectrum, those
who want excellent randoms to run tetris and those who don't care
to always play the same party on every boot because they just want
to play. And by making /dev/urandom behave like getrandom() we could
actually tell users "both are now exactly the same, you have no valid
reason anymore not to use the new API". And it forces us to remain
very reasonable in getrandom() so that we don't break old applications
that relied on urandom to be fast.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 19:18                                     ` Willy Tarreau
@ 2019-09-15 19:31                                       ` Linus Torvalds
  2019-09-15 19:54                                         ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-15 19:31 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@1wt.eu> wrote:
>
> Oh no I definitely don't want this behavior at all for urandom, what
> I'm saying is that as long as getrandom() will have a lower quality
> of service than /dev/urandom for non-important randoms

Ahh, here you're talking about the fact that it can block at all being
"lower quality".

I do agree that getrandom() is doing some odd things. It has the
"total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
it has no mode of replacing /dev/urandom.

So if you want the /dev/urandom bvehavior, then no, getrandom() simply
has never given you that.

Use /dev/urandom if you want that.

Sad, but there it is. We could have a new flag (GRND_URANDOM) that
actually gives the /dev/urandom behavior. But the ostensible reason
for getrandom() was the blocking for entropy. See commit c6e9d6f38894
("random: introduce getrandom(2) system call") from back in 2014.

The fact that it took five years to hit this problem is probably due
to two reasons:

 (a) we're actually pretty good about initializing the entropy pool
fairly quickly most of the time

 (b) people who started using 'getrandom()' and hit this issue
presumably then backed away from it slowly and just used /dev/urandom
instead.

So it needed an actual "oops, we don't get as much entropy from the
filesystem accesses" situation to actually turn into a problem. And
presumably the people who tried out things like nvdimm filesystems
never used Arch, and never used a sufficiently new systemd to see the
"oh, without disk interrupts you don't get enough randomness to boot".

One option is to just say that GRND_URANDOM is the default (ie never
block, do the one-liner log entry to warn) and add a _new_ flag that
says "block for entropy". But if we do that, then I seriously think
that the new behavior should have that timeout limiter.

For 5.3, I'll just revert the ext4 change, stupid as that is. That
avoids the regression, even if it doesn't avoid the fundamental
problem. And gives us time to discuss it.

                 Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 19:31                                       ` Linus Torvalds
@ 2019-09-15 19:54                                         ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-15 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 12:31:42PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 12:18 PM Willy Tarreau <w@1wt.eu> wrote:
> >
> > Oh no I definitely don't want this behavior at all for urandom, what
> > I'm saying is that as long as getrandom() will have a lower quality
> > of service than /dev/urandom for non-important randoms
> 
> Ahh, here you're talking about the fact that it can block at all being
> "lower quality".
> 
> I do agree that getrandom() is doing some odd things. It has the
> "total blocking mode" of /dev/random (if you pass it GRND_RANDOM), but
> it has no mode of replacing /dev/urandom.

Yep but with your change it's getting better.

> So if you want the /dev/urandom bvehavior, then no, getrandom() simply
> has never given you that.
> 
> Use /dev/urandom if you want that.

It's not available in chroot, which is the main driver for getrandom()
I guess.

> Sad, but there it is. We could have a new flag (GRND_URANDOM) that
> actually gives the /dev/urandom behavior. But the ostensible reason
> for getrandom() was the blocking for entropy. See commit c6e9d6f38894
> ("random: introduce getrandom(2) system call") from back in 2014.

Oh I definitely know it's been a long debate.

> The fact that it took five years to hit this problem is probably due
> to two reasons:
> 
>  (a) we're actually pretty good about initializing the entropy pool
> fairly quickly most of the time
> 
>  (b) people who started using 'getrandom()' and hit this issue
> presumably then backed away from it slowly and just used /dev/urandom
> instead.

We've hit it the hard way more than a year ago already, when openssl
adopted getrandom() instead of urandom for certain low-importance
things in order to work better in chroots and/or avoid fd leaks. And
even openssl had to work around these issues in multiple iterations
(I don't remember how however).

> So it needed an actual "oops, we don't get as much entropy from the
> filesystem accesses" situation to actually turn into a problem. And
> presumably the people who tried out things like nvdimm filesystems
> never used Arch, and never used a sufficiently new systemd to see the
> "oh, without disk interrupts you don't get enough randomness to boot".

In my case the whole system is in the initramfs and the only accesses
to the flash are to read the config. So that's pretty a limited source
of interrupts for a headless system ;-)

> One option is to just say that GRND_URANDOM is the default (ie never
> block, do the one-liner log entry to warn) and add a _new_ flag that
> says "block for entropy". But if we do that, then I seriously think
> that the new behavior should have that timeout limiter.

I think the timeout is a good thing to do, but it would be nice to
let the application know that what was provided was probably not as
good as expected (well if the application wants real random, it
should use GRND_RANDOM).

> For 5.3, I'll just revert the ext4 change, stupid as that is. That
> avoids the regression, even if it doesn't avoid the fundamental
> problem. And gives us time to discuss it.

It's sad to see that being excessive on randomness leads to forcing
totally unrelated subsystem to be less efficient :-(

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15 16:29                             ` Linus Torvalds
@ 2019-09-16  1:40                               ` Ahmed S. Darwish
  2019-09-16  1:48                                 ` Vito Caputo
  2019-09-16  3:31                                 ` Linus Torvalds
  0 siblings, 2 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-16  1:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Theodore Y. Ts'o, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 09:29:55AM -0700, Linus Torvalds wrote:
> On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
> <mzxreary@0pointer.de> wrote:
> >
> > Oh man. Just spend 5min to understand the situation, before claiming
> > this was garbage or that was garbage. The code above does not block
> > boot.
> 
> Yes it does. You clearly didn't read the thread.
> 
> > It blocks startup of services that explicit order themselves
> > after the code above. There's only a few services that should do that,
> > and the main system boots up just fine without waiting for this.
> 
> That's a nice theory, but it doesn't actually match reality.
> 
> There are clearly broken setups that use this for things that it
> really shouldn't be used for. Asking for true randomness at boot
> before there is any indication that randomness exists, and then just
> blocking with no further action that could actually _generate_ said
> randomness.
> 
> If your description was true that the system would come up and be
> usable while the blocked thread is waiting for that to happen, things
> would be fine.
>

A small note here, especially after I've just read the commit log of
72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
unfairly blames systemd there.

Yes, the systemd-random-seed(8) process blocks, but this is an
isolated process, and it's only there as a synchronization point and
to load/restore random seeds from disk across reboots.

The wisdom of having a sysnchronization service ("before/after urandom
CRNG is inited") can be debated. That service though, and systemd in
general, did _not_ block the overall system boot.

What blocked the system boot was GDM/gnome-session implicitly calling
getrandom() for the Xorg MIT cookie. This was shown in the strace log
below:

   https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc

thanks,

-- 
darwi
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  1:40                               ` Ahmed S. Darwish
@ 2019-09-16  1:48                                 ` Vito Caputo
  2019-09-16  2:49                                   ` Theodore Y. Ts'o
  2019-09-16  3:31                                 ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Vito Caputo @ 2019-09-16  1:48 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 03:40:50AM +0200, Ahmed S. Darwish wrote:
> On Sun, Sep 15, 2019 at 09:29:55AM -0700, Linus Torvalds wrote:
> > On Sat, Sep 14, 2019 at 11:51 PM Lennart Poettering
> > <mzxreary@0pointer.de> wrote:
> > >
> > > Oh man. Just spend 5min to understand the situation, before claiming
> > > this was garbage or that was garbage. The code above does not block
> > > boot.
> > 
> > Yes it does. You clearly didn't read the thread.
> > 
> > > It blocks startup of services that explicit order themselves
> > > after the code above. There's only a few services that should do that,
> > > and the main system boots up just fine without waiting for this.
> > 
> > That's a nice theory, but it doesn't actually match reality.
> > 
> > There are clearly broken setups that use this for things that it
> > really shouldn't be used for. Asking for true randomness at boot
> > before there is any indication that randomness exists, and then just
> > blocking with no further action that could actually _generate_ said
> > randomness.
> > 
> > If your description was true that the system would come up and be
> > usable while the blocked thread is waiting for that to happen, things
> > would be fine.
> >
> 
> A small note here, especially after I've just read the commit log of
> 72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
> unfairly blames systemd there.
> 
> Yes, the systemd-random-seed(8) process blocks, but this is an
> isolated process, and it's only there as a synchronization point and
> to load/restore random seeds from disk across reboots.
> 
> The wisdom of having a sysnchronization service ("before/after urandom
> CRNG is inited") can be debated. That service though, and systemd in
> general, did _not_ block the overall system boot.
> 
> What blocked the system boot was GDM/gnome-session implicitly calling
> getrandom() for the Xorg MIT cookie. This was shown in the strace log
> below:
> 
>    https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc
> 

So did systemd-random-seed instead drain what little entropy there was
before GDM started, increasing the likelihood a subsequent getrandom()
call would block?

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 18:59                                 ` Linus Torvalds
  2019-09-15 19:12                                   ` Willy Tarreau
@ 2019-09-16  2:45                                   ` Ahmed S. Darwish
  1 sibling, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-16  2:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Theodore Y. Ts'o, Alexander E. Patrakov,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml,
	Lennart Poettering

On Sun, Sep 15, 2019 at 11:59:41AM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 11:32 AM Willy Tarreau <w@1wt.eu> wrote:
> >
> > I think that the exponential decay will either not be used or
> > be totally used, so in practice you'll always end up with 0 or
> > 30s depending on the entropy situation
> 
> According to the systemd random-seed source snippet that Ahmed posted,
> it actually just tries once (well, first once non-blocking, then once
> blocking) and then falls back to reading urandom if it fails.
> 
> So assuming there's just one of those "read much too early" cases, I
> think it actually matters.
>

Just a quick note, the snippest I posted:

    https://lkml.kernel.org/r/20190914150206.GA2270@darwi-home-pc

is not PID 1.

It's just a lowly process called "systemd-random-seed". Its main
reason of existence is to load/restore a random seed file from and to
disk across reboots (just like what sysv scripts did).

The reason I posted it was to show that if we change getrandom() to
silently return weak crypto instead of blocking or an error code,
systemd-random-seed will break: it will save the resulting data to
disk, then even _credit_ it (if asked to) in the next boot cycle
through RNDADDENTROPY.

> But while I tried to test this, on my F30 install, systemd seems to
> always just use urandom().
> 
> I can trigger the urandom read warning easily enough (turn of CPU
> rdrand trusting and increase the entropy requirement by a factor of
> ten, and turn of the ioctl to add entropy from user space), just not
> the getrandom() blocking case at all.
>

Yeah, because the problem was/is not with systemd :)

It is GDM/gnome-session which was blocking the graphical boot process.

Regarding reproducing the issue, through a quick trace_prink, all of
below processes are calling getrandom() on my Arch system at boot:

    https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc

The fatal call was gnome-session's one, because gnome didn't continue
_its own_ boot due to this blockage.

> So presumably that's because I have a systemd that doesn't use
> getrandom() at all, or perhaps uses the 'rdrand' instruction directly.
> Or maybe because Arch has some other oddity that just triggers the
> problem.
>

It seems Arch is good at triggering this. For example, here is a
another Arch user on a Thinkpad (different model than mine), also with
GDM getting blocked on entropy:

    https://bbs.archlinux.org/viewtopic.php?id=248035
    
    "As you can see, the system is literally waiting a half minute for
    something - up until crng init is done"

(The NetworkManager logs are just noise. I also had them, but completely
 disabling NetworkManager didn't do anything .. just made the logs
 cleaner)

thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  1:48                                 ` Vito Caputo
@ 2019-09-16  2:49                                   ` Theodore Y. Ts'o
  2019-09-16  4:29                                     ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-16  2:49 UTC (permalink / raw)
  To: Vito Caputo
  Cc: Ahmed S. Darwish, Linus Torvalds, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 06:48:34PM -0700, Vito Caputo wrote:
> > A small note here, especially after I've just read the commit log of
> > 72dbcf721566 ('Revert ext4: "make __ext4_get_inode_loc plug"'), which
> > unfairly blames systemd there.
    ...
> > What blocked the system boot was GDM/gnome-session implicitly calling
> > getrandom() for the Xorg MIT cookie. This was shown in the strace log
> > below:
> > 
> >    https://lkml.kernel.org/r/20190910173243.GA3992@darwi-home-pc

Yes, that's correct, this isn't really systemd's fault.  It's a
combination of GDM/gnome-session stupidly using MIT Magic Cookie at
*all* (it was a bad idea 30 years ago, and it's a bad idea in 2019),
GDM/gnome-session using getrandom(2) at all; it should have just stuck
with /dev/urandom, or heck just used random_r(3) since when we're
talking about MIT Magic Cookie, there's no real security *anyway*.

It's also a combination of the hardware used by this particular user,
the init scripts in use that were probably not generating enough read
requests compared to other distributions (ironically, distributions
and init systems that try the hardest to accelerate the boot make this
problem worse by reducing the entropy that can be harvested from I/O).
And then when we optimzied ext4 so it would be more efficient, that
tipped this particular user over the edge.

Linus might not have liked my proposal to disable the optimization if
the CRNG isn't optimized, but ultimately this problem *has* gotten
worse because we've optimized things more.  So to the extent that
systemd has made systems boot faster, you could call that systemd's
"fault" --- just as Linus reverting ext4's performance optimization is
ssaying that it's ext4 "fault" because we had the temerity to try to
make the file system be more efficient, and hence, reduce entropy that
can be collected.


Ultimately, though, the person who touches this last is whose "fault"
it is.  And the problem is because it really is a no-win situation
here.  No matter *what* we do, it's going to either (a) make some
systems insecure, or (b) make some systems more likely hang while
booting.  Whether you consider the risk of (a) or (b) to be worse is
ultimately going to cause you to say that people of the contrary
opinion are either "being reckless with system security", or
"incompetent at system design".

And really, it's all going to depend on how the Linux kernel is being
used.  The fact that Linux is being used in IOT devices, mobile
handsets, desktops, servers running in VM's, user desktops, etc.,
means that there will be some situations where blocking is going to be
terrible, and some situations where a failure to provide system
security could result in risking someone's life, health, or mission
failure in some critical system.

That's why this discussion can easily get toxic.  If you are only
focusing on one part of Linux market, then obviously *you* are the
only sane one, and everyone *else* who disagrees with you must be
incompetent.  When, perhaps, they may simply be focusing on a
different part of the ecosystem where Linux is used.

> So did systemd-random-seed instead drain what little entropy there was
> before GDM started, increasing the likelihood a subsequent getrandom()
> call would block?

No.  Getrandom(2) uses the new CRNG, which is either initialized, or
it's not.  Once it's initialized, it won't block again ever.

     	   	     		     	   - Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-15 17:02                                 ` Linus Torvalds
@ 2019-09-16  3:23                                   ` Theodore Y. Ts'o
  2019-09-16  3:40                                     ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-16  3:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Alexander E. Patrakov, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 10:02:18AM -0700, Linus Torvalds wrote:
> But on a PC, we can _almost_ guarantee entropy. Even with a golden
> image, we do mix in:
> 
>  - timestamp counter on every device interrupt (but "device interrupt"
> doesn't include things like the local CPU timer, so it really needs
> device activity)
> 
>  - random boot and BIOS memory (dmi tables, the EFI RNG entry, etc)
> 
>  - various device state (things like MAC addresses when registering
> network devices, USB device numbers, etc)
> 
>  - and obviously any CPU rdrand data
> 	....
> But also note the "on a PC" part.

Hopefully there is no disagreement with this.  I completely agree that
if we only care about user desktops running on PC's, getrandom(2)
should never block, and *hopefully* a big fact kernel stack dump will
cause developers to wake up and pay attention.  And even if they don't
essentially all modern systems have RDRAND, and RDRAND will save you.
We're also not using the EFI RNG yet, but we should, and once we do,
that will again help for all modern PC's.

However, there are exceptions here --- and we don't even need to leave
the X86 architecture.  If you are running in a VM, there won't be a
lot of interrutps, and some hosts may disable RDRAND (or are on a
system where RDRAND was buggy, and hence disabled), and the dmi tables
are pretty much constant and trivial for an attacker to deduce.

> But basically, you should never *ever* try to generate some long-lived
> key and then just wait for it without doing anything else. The
> "without doing anything else" is key here.
> 
> But every time we've had a blocking interface, that's exactly what
> somebody has done. Which is why I consider that long blocking thing to
> be completely unacceptable. There is no reason to believe that the
> wait will ever end, partly exactly because we don't consider timer
> interrupts to add any timer randomness. So if you are just waiting,
> nothing necessarily ever happen.

Ultimately, the question is whether blocking is unacceptable, or
compromising the user's security is unacceptable.  The former is much
more likely to cause users to whine on LKML and send complaints of
regressions to Linus.  No question about that.

But not blocking is *precisely* what lead us to weak keys in network
devices that were sold by the millions to users in their printers,
wifi routers, etc.  And with /dev/urandom, we didn't block, and we did
issue a warning messages, and it didn't stop consumer electronic
vendors from screwing up.  And then there will be another paper
published, and someone will contact security@kernel.org, and it will
be blamed on the Linux kernel, because best practice really *is* to
block until you can return cryptographic randomness, because we can
take it on *faith* that there will be some (and probably many) user
space programmers which rally don't know how to do system design,
especially secure systems design.  Many of them won't even bother to
look at system logs.

And even blocking for 15 seconds may not necessarily help much, since
consumer grade electronics won't have a serial console, and hardware
engineers might not even notice a 15 second delay.  Sure, someone who
is used to a laptop booting up in 3 seconds will be super annoyed by a
15 second delay --- but there are many contexts where a 15 second
delay is nothing.

It often takes a minute or more to start up a Cloud VM, for example,
and if users aren't checking the system logs --- and most IOT
application programmers won't be checking system logs, and 15 seconds
to boot might not even be noticed during development for some devices.
And even on a big x86 server, it can take 5+ minutes for it to boot
(between BIOS and kernel probe time), so 15 seconds won't be noticed.

Linus, I know you don't like the config approach, but the problem is
there is not going to be any "one size fits all" solution, because
Linux gets used in so many places.  We can set up defaults so that for
x86, we never block and just create a big fat warning, and cross our
fingers and hope that's enough.  But on other platforms, 15 seconds
won't be the right number, and you might actually need something
closer to two minutes before the delay will be noticed.  And on some
of these other platforms, the use of "best effort" randomness might be
***far*** more catastrophic from a security perspective than on an
x86.

This is why I really want the CONFIG option.  I'm willing to believe
that the x86 architecture will mostly be safe, so we could never ask
for the option on some set of architectures (unless CONFIG_EXPERT is
enabled).  But there will be other architectures and use cases where
"never blocking" and "return best effort randomness" is going to be
unacceptable, and lead to massive security problems, that could be
quite harmful.  So for those architectures, I'd really like to make
the CONFIG option be visible, and even default it to "block".

For the embedded use case, we want it to be blatently obvious that
there is a problem, so the developer finds it, and not the consumer.
And blocking forever really is the best way to force the embedded
programmer to notice that there is a problem, and then fix userspace,
or add a hardware RNG, etc.  And that's because for embeeded
arhictectures, blocking really is no big deal, but letting a product
escape with a massive security hole caused by "best efforts"
randomness being garbage is in my book, completely unacceptable.

Regards, 

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  1:40                               ` Ahmed S. Darwish
  2019-09-16  1:48                                 ` Vito Caputo
@ 2019-09-16  3:31                                 ` Linus Torvalds
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16  3:31 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Lennart Poettering, Theodore Y. Ts'o, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 6:41 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> Yes, the systemd-random-seed(8) process blocks, but this is an
> isolated process, and it's only there as a synchronization point and
> to load/restore random seeds from disk across reboots.
>
> What blocked the system boot was GDM/gnome-session implicitly calling
> getrandom() for the Xorg MIT cookie.

Aahh. I saw that email, but then in the discussion the systemd case
always ended up coming up first, and I never made the connection.

What a complete crock that silly MIT random cookie is, and what a sad
sad reason for blocking.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  3:23                                   ` Theodore Y. Ts'o
@ 2019-09-16  3:40                                     ` Linus Torvalds
  2019-09-16  3:56                                       ` Linus Torvalds
  2019-09-16 17:00                                       ` Theodore Y. Ts'o
  0 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16  3:40 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Lennart Poettering, Alexander E. Patrakov, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 8:23 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> But not blocking is *precisely* what lead us to weak keys in network
> devices that were sold by the millions to users in their printers,
> wifi routers, etc.

Ted, just admit that you are wrong on this, instead of writing the
above kind of bad fantasy.

We have *always* supported blocking. It's called "/dev/random". And
guess what? Not blocking wasn't what lead to weak keys like you try to
imply.

What led to weak keys is that /dev/random is useless and nobody sane
uses it, exactly because it always blocks.

So you claim that it is lack of blocking that is the problem, but
you're ignoring reality. You are ignoring the very real fact that
blocking is what led to people not using the blocking interface in the
first place, because IT IS THE WRONG MODEL.

It really is fundamentally wrong. Blocking by definition will never
work, because it doesn't add any entropy. So people then don't use the
blocking interface, because it doesn't _work_.

End result: they then use another interface that does work, but isn't secure.

I have told you that in this thread, and HISTORY should have told you
that. You're not listening.

If you want secure keys, you can't rely on a blocking model, because
it ends up not working. Blocking leads to problems.

If you want secure keys, you should do the exact opposite of blocking:
you should encourage people to explicitly use a non-blocking "I want
secure random numbers", and then if that fails, they should do things
that cause entropy.

So getrandom() just repeated a known broken model. And you're
parroting that same old known broken stuff. It didn't work with
/dev/random, why do you think it magically works with getrandom()?

Stop fighting reality.

The fact is, either you have sufficient entropy or you don't.

 - if you have sufficient entropy, blocking is stupid and pointless

 - if you don't have sufficient entropy, blocking is exactly the wrong
thing to do.

Seriously. Don't make excuses for bad interfaces. We should have
learnt this long ago.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-11 16:07         ` Theodore Y. Ts'o
  2019-09-11 16:45           ` Linus Torvalds
@ 2019-09-16  3:52           ` Herbert Xu
  2019-09-16  4:21             ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Herbert Xu @ 2019-09-16  3:52 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: torvalds, darwish.07, adilger.kernel, jack, rstrode, mccann,
	zachary, linux-ext4, linux-kernel

Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> Ultimately, though, we need to find *some* way to fix userspace's
> assumptions that they can always get high quality entropy in early
> boot, or we need to get over people's distrust of Intel and RDRAND.
> Otherwise, future performance improvements in any part of the system
> which reduces the number of interrupts is always going to potentially
> result in somebody's misconfigured system or badly written
> applications to fail to boot.  :-(

Can we perhaps artifically increase the interrupt rate while the
CRNG is not initialised?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  3:40                                     ` Linus Torvalds
@ 2019-09-16  3:56                                       ` Linus Torvalds
  2019-09-16 17:00                                       ` Theodore Y. Ts'o
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16  3:56 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Lennart Poettering, Alexander E. Patrakov, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 8:40 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you want secure keys, you can't rely on a blocking model, because
> it ends up not working. Blocking leads to problems.

Side note: I'd argue that (despite my earlier mis-understanding) the
only really valid use of "block until there is entropy" is the
systemd-random-seed model that blocks not because it wants a secure
key, but blocks because it wants to save the (now properly) random
seed for later.

So apologies to Lennart - he was very much right, and I mis-understood
Ahmed's bug report. Systemd was blameless, and blocked correctly.

While blocking for actual random keys was the usual bug, just for that
silly and pointless MIT cookie that doesn't even need the secure
randomness.

But because the getrandom() interface was mis-designed (and only
_looks_ like a more convenient interface for /dev/urandom, without
being one), the MIT cookie code got the blocking whether it wanted to
or not.

Just say no to blocking for key data.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  3:52           ` Herbert Xu
@ 2019-09-16  4:21             ` Linus Torvalds
  2019-09-16  4:53               ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16  4:21 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Theodore Y. Ts'o, Ahmed S. Darwish, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	Linux List Kernel Mailing

On Sun, Sep 15, 2019 at 8:52 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Can we perhaps artifically increase the interrupt rate while the
> CRNG is not initialised?

Long term (or even medium term in some areas), the problem really is
that device interrupts during boot really are going away, rather than
becoming more common.

That just happened to be the case now because of improved plugging,
but it's fundamentally the direction any storage is moving with faster
flash interfaces.

The only interrupt we could easily increase the rate of in the kernel
is the timer interrupt, but that's also the interrupt that is the
least useful for randomness.

The timer interrupt could be somewhat interesting if you are also
CPU-bound on a non-trivial load, because then "what program counter
got interrupted" ends up being possibly unpredictable - even with a
very stable timer interrupt source - and effectively stand in for a
cycle counter even on hardware that doesn't have a native TSC. Lots of
possible low-level jitter there to use for entropy. But especially if
you're just idly _waiting_ for entropy, you won't be "CPU-bound on an
interesting load" - you'll just hit the CPU idle loop all the time so
even that wouldn't work.

But practically speaking timers really are not really much of an
option. And if we are idle, even having a high-frequency TSC isn't all
that useful with the timer interrupt, because the two tend to be very
intimately related.

Of course, if you're generating a host key for SSH or something like
that, you could try to at least cause some network traffic while
generating the key. That's not much of an option for the _kernel_, but
for a program like ssh-keygen it certainly could be.

Blocking is fine if you simply don't care about time at all (the "five
hours later is fine" situation), or if you have some a-priori
knowledge that the machine is doing real interesting work that will
generate entropy. But I don't see how the kernel can generate entropy
on its own, particularly during boot (which is when the problem
happens), when most devices aren't even necessarily meaningfully set
up yet.

Hopefully hw random number generators will make this issue effectively
moot before we really end up having the "nvdimms and their ilk are
common enough that you really have no early boot irq-driven disk IO at
all".

           Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  2:49                                   ` Theodore Y. Ts'o
@ 2019-09-16  4:29                                     ` Willy Tarreau
  2019-09-16  5:02                                       ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-16  4:29 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Vito Caputo, Ahmed S. Darwish, Linus Torvalds,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

Hi Ted,

On Sun, Sep 15, 2019 at 10:49:04PM -0400, Theodore Y. Ts'o wrote:
> No matter *what* we do, it's going to either (a) make some
> systems insecure, or (b) make some systems more likely hang while
> booting.

I continue to strongly disagree with opposing these two. (b) is
caused precisely because of this conflation. Life long keys are
produced around once per system's life (at least this order of
magnitude). Boot happens way more often. Users would not complain
that systems fail to start if the two types of random are properly
distinguished so that we don't fail to boot just for the sake of
secure randoms that will never be consumed as such.

Before systems had HWRNGs it was pretty common for some tools to
ask the user to type hundreds of characters on the keyboard and
use that (content+timings) to feed entropy while generating a key.
This is acceptable once in a system's life. And on some systems
with no entropy like VMs, it's commonly generated from a central
place and never from the VM itself, so it's not a problem either.

In my opinion the problem recently happened because getrandom()
was perceived as a good replacement for /dev/urandom and is way
more convenient to use, so applications progressively started to
use it without realizing that contrary to its ancestor it can
block. And each time a system fails to boot confirms that entropy
still remains a problem even on PCs in 2019. This is one more
reason for clearly keeping two interfaces depending on what type
of random is needed.

I'd be in favor of adding in the man page something like "this
random source is only suitable for applications which will not be
harmed by getting a predictable value on output, and as such it is
not suitable for generation of system keys or passwords, please
use GRND_RANDOM for this". This distinction currently is not clear
enough for people who don't know this subtle difference, and can
increase the interface's misuse.

Regards,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  4:21             ` Linus Torvalds
@ 2019-09-16  4:53               ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-16  4:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Theodore Y. Ts'o, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, Linux List Kernel Mailing

On Sun, Sep 15, 2019 at 09:21:06PM -0700, Linus Torvalds wrote:
> The timer interrupt could be somewhat interesting if you are also
> CPU-bound on a non-trivial load, because then "what program counter
> got interrupted" ends up being possibly unpredictable - even with a
> very stable timer interrupt source - and effectively stand in for a
> cycle counter even on hardware that doesn't have a native TSC. Lots of
> possible low-level jitter there to use for entropy. But especially if
> you're just idly _waiting_ for entropy, you won't be "CPU-bound on an
> interesting load" - you'll just hit the CPU idle loop all the time so
> even that wouldn't work.

In the old DOS era, I used to produce randoms by measuring the time it
took for some devices to reset themselves (typically 8250 UARTs could
take in the order of milliseconds). And reading their status registers
during the reset phase used to show various sequences of flags at
approximate timings.

I suspect this method is still usable, even with SoCs full of peripherals,
in part because not all clocks are synchronous, so we can retrieve a
little bit of entropy by measuring edge transitions. I don't know how
we can assess the number of bits provided by such method (probably
log2(card(discrete values))) but maybe this is something we should
progressively encourage drivers authors to do in the various device
probing functions once we figure the best way to do it.

The idea is around this. Instead of :

     probe(dev)
     {
          (...)
          while (timeout && !(status_reg & STATUS_RDY))
               timeout--;
          (...)
     }

We could do something like this (assuming 1 bit of randomness here) :

     probe(dev)
     {
          (...)
          prev_timeout = timeout;
          prev_reg     = status_reg;
          while (timeout && !(status_reg & STATUS_RDY)) {
               if (status_reg != prev_reg) {
                     add_device_randomness_bits(timeout - prev_timeout, 1);
                     prev_timeout = timeout;
                     prev_reg = status_reg;
               }
               timeout--;
          }
          (...)
     }

It's also interesting to note that on many motherboards there are still
multiple crystal oscillators (typically one per ethernet port) and that
such types of independent, free-running clocks do present unpredictable
edges compared to the CPU's clock, so when they affect the device's
setup time, this does help quite a bit.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  4:29                                     ` Willy Tarreau
@ 2019-09-16  5:02                                       ` Linus Torvalds
  2019-09-16  6:12                                         ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16  5:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Vito Caputo, Ahmed S. Darwish,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Sun, Sep 15, 2019 at 9:30 PM Willy Tarreau <w@1wt.eu> wrote:
>
> I'd be in favor of adding in the man page something like "this
> random source is only suitable for applications which will not be
> harmed by getting a predictable value on output, and as such it is
> not suitable for generation of system keys or passwords, please
> use GRND_RANDOM for this".

The problem with GRND_RANDOM is that it also ends up extracting
entropy, and has absolutely horrendous performance behavior. It's why
hardly anybody uses /dev/random.

Which nobody should really ever do. I don't understand why people want
that thing, considering that the second law of thermodynamics really
pretty much applies. If you can crack the cryptographic hashes well
enough to break them despite reseeding etc, people will have much more
serious issues than the entropy accounting.

So the problem with getrandom() is that it only offered two flags, and
to make things worse they were the wrong ones.

Nobody should basically _ever_ use the silly "entropy can go away"
model, yet that is exactly what GRND_RANDOM does.

End result: GRND_RANDOM is almost entirely useless, and is actively
dangerous, because it can actually block not just during boot, it can
block (and cause others to block) during random running of the system
because it does that entropy accounting().

Nobody can use GRND_RANDOM if they have _any_ performance requirements
what-so-ever. It's possibly useful for one-time ssh host keys etc.

So GRND_RANDOM is just bad - with or without GRND_NONBLOCK, because
even in the nonblocking form it will account for entropy in the
blocking pool (until it's all gone, and it will return -EAGAIN).

And the non-GRND_RANDOM case avoids that problem, but requires the
initial entropy with no way to opt out of it. Yes, GRND_NONBLOCK makes
it work.

So we have four flag combinations:

 - 0 - don't use if it could possibly run at boot

   Possibly useful for the systemd-random-seed case, and if you *know*
you're way past boot, but clearly overused.

   This is the one that bit us this time.

 - GRND_NONBLOCK - fine, but you now don't get even untrusted random
numbers, and you have to come up with a way to fill the entropy pool

   This one is most useful as a quick "get me urandom", but needs a
fallback to _actual_ /dev/urandom when it fails.

   This is the best choice by far, and has no inherent downsides apart
from needing that fallback code.

 - GRND_RANDOM - don't use

   This will block and it will decrease the blocking pool entropy so
that others will block too, and has horrible performance.

   Just don't use it outside of very occasional non-serious work.

   Yes, it will give you secure numbers, but because of performance
issues it's not viable for any serious code, and obviously not for
bootup.

    It can be useful as a seed for future serious use that just does
all random handling in user space. Just not during boot.

 - GRND_RANDOM | GRND_NONBLOCK - don't use

   This won't block, but it will decrease the blocking pool entropy.

   It might be an acceptable "get me a truly secure ring with reliable
performance", but when it fails, you're going to be unhappy, and there
is no obvious fallback.

So three out of four flag combinations end up being mostly "don't
use", and the fourth one isn't what you'd normally want (which is just
plain /dev/urandom semantics).

                     Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  5:02                                       ` Linus Torvalds
@ 2019-09-16  6:12                                         ` Willy Tarreau
  2019-09-16 16:17                                           ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-16  6:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Vito Caputo, Ahmed S. Darwish,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Sun, Sep 15, 2019 at 10:02:02PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 9:30 PM Willy Tarreau <w@1wt.eu> wrote:
> >
> > I'd be in favor of adding in the man page something like "this
> > random source is only suitable for applications which will not be
> > harmed by getting a predictable value on output, and as such it is
> > not suitable for generation of system keys or passwords, please
> > use GRND_RANDOM for this".
> 
> The problem with GRND_RANDOM is that it also ends up extracting
> entropy, and has absolutely horrendous performance behavior. It's why
> hardly anybody uses /dev/random.
>
> Which nobody should really ever do. I don't understand why people want
> that thing, considering that the second law of thermodynamics really
> pretty much applies. If you can crack the cryptographic hashes well
> enough to break them despite reseeding etc, people will have much more
> serious issues than the entropy accounting.

That's exactly what I was thinking about a few minutes ago and which
drove me back to mutt :-)

> So the problem with getrandom() is that it only offered two flags, and
> to make things worse they were the wrong ones.
(...)
>  - GRND_RANDOM | GRND_NONBLOCK - don't use
> 
>    This won't block, but it will decrease the blocking pool entropy.
> 
>    It might be an acceptable "get me a truly secure ring with reliable
> performance", but when it fails, you're going to be unhappy, and there
> is no obvious fallback.
> 
> So three out of four flag combinations end up being mostly "don't
> use", and the fourth one isn't what you'd normally want (which is just
> plain /dev/urandom semantics).

I'm seeing it from a different angle. I now understand better why
getrandom() absolutely wants to have an initialized pool, it's to
encourage private key producers to use a secure, infinite source of
randomness. Something that neither /dev/random nor /dev/urandom
reliably provide. Unfortunately it does it by changing how urandom
works while it ought to have done it as the replacement of /dev/random.

The 3 random generation behaviors we currently support are :

  - /dev/random: only returns safe random (blocks), AND depletes entropy.
    getrandom(GRND_RANDOM) does the same.
  - /dev/urandom: returns whatever (never blocks), inexhaustible
  - getrandom(0): returns safe random (blocks), inexhaustible

Historically we used to want to rely on /dev/random for SSH keys and
certificates. It's arguable that with the massive increase of crypto
usage, what used to be done only once in a system's life happens a
bit more often and using /dev/random here can sometimes become a
problem because it harms the whole system (thus why I said I think that
we could almost require CAP_something to access it). Applications
falling back to /dev/urandom obviously resulted in the massive mess
we've seen years ago, even if it apparently solved the problem for
their users. Thus getrandom(0) does make sense, but not as an
alternative to urandom but to random, since it returns randoms safe
for use for long lived keys.

Couldn't we simply change the way things work ? Make GRND_RANDOM *not*
deplate entropy, and document it as the only safe source, and make the
default call return the same as /dev/urandom ? We can then use your
timeout mechanism for the first one (which is not supposed to be called
often and would be more accepted with a moderately long delay).

Applications need to evolve as well. It's fine to use libraries to do
whatever you need for you but ultimately the lib exports a function for
a generic use case and doesn't know how to best adapt to the use case.
Typically I would expect an SSH/HTTP daemon running in a recovery
initramfs to produce unsafe randoms so that I can connect there without
having to dance around it. However the self-signed cert produced there
must not be saved, just like the SSH host key. But this means that the
application (here the ssh-keygen or openssl) also need to be taught to
purposely produce insecure keys when explicitly instructed to do so.
Otherwise we know what will happen in the long term, since history
repeats itself as long as the conditions are not changed :-/

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10 11:56   ` Theodore Y. Ts'o
@ 2019-09-16 10:33     ` Christoph Hellwig
  0 siblings, 0 replies; 211+ messages in thread
From: Christoph Hellwig @ 2019-09-16 10:33 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Andreas Dilger, Linus Torvalds, Jan Kara,
	zhangjs, linux-ext4, linux-kernel

On Tue, Sep 10, 2019 at 07:56:35AM -0400, Theodore Y. Ts'o wrote:
> Hmm, I'm not seeing this on a Dell XPS 13 (model 9380) using a Debian
> Bullseye (Testing) running a rc4+ kernel.
> 
> This could be because Debian is simply doing more I/O; or it could be
> because I don't have some package installed which is trying to reading
> from /dev/random or calling getrandom(2).  Previously, Fedora ran into
> blocking issues because of some FIPS compliance patches to some
> userspace daemons.  So it's going to be very user space dependent and
> package dependent.

Btw, I've been seeing this issue on debian testing with an XFS root
file system ever since the blocking random changes went in.  There
are a few reports (not from me) in the BTS since.  I ended up just
giving up on gdm and using lightdm instead as it was clearly related
to that.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  6:12                                         ` Willy Tarreau
@ 2019-09-16 16:17                                           ` Linus Torvalds
  2019-09-16 17:21                                             ` Theodore Y. Ts'o
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16 16:17 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Vito Caputo, Ahmed S. Darwish,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Sun, Sep 15, 2019 at 11:13 PM Willy Tarreau <w@1wt.eu> wrote:
>
> >
> > So three out of four flag combinations end up being mostly "don't
> > use", and the fourth one isn't what you'd normally want (which is just
> > plain /dev/urandom semantics).
>
> I'm seeing it from a different angle. I now understand better why
> getrandom() absolutely wants to have an initialized pool, it's to
> encourage private key producers to use a secure, infinite source of
> randomness.

Right. There is absolutely no question that that is a useful thing to have.

And that's what GRND_RANDOM _should_ have meant. But didn't.

So the semantics that getrandom() should have had are:

 getrandom(0) - just give me reasonable random numbers for any of a
million non-strict-long-term-security use (ie the old urandom)

    - the nonblocking flag makes no sense here and would be a no-op

 getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
blocking until entropy pool fills (but not the completely invalid
entropy decrease accounting)

    - the nonblocking flag is useful for bootup and for "I will
actually try to generate entropy".

and both of those are very very sensible actions. That would actually
have _fixed_ the problems we had with /dev/[u]random, both from a
performance standpoint and for a filesystem access standpoint.

But that is sadly not what we have right now.

And I suspect we can't fix it, since people have grown to depend on
the old behavior, and already know to avoid GRND_RANDOM because it's
useless with old kernels even if we fixed it with new ones.

Does anybody really seriously debate the above? Ted? Are you seriously
trying to claim that the existing GRND_RANDOM has any sensible use?
Are you seriously trying to claim that the fact that we don't have a
sane urandom source is a "feature"?

                   Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16  3:40                                     ` Linus Torvalds
  2019-09-16  3:56                                       ` Linus Torvalds
@ 2019-09-16 17:00                                       ` Theodore Y. Ts'o
  2019-09-16 17:07                                         ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-16 17:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Alexander E. Patrakov, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Sun, Sep 15, 2019 at 08:40:30PM -0700, Linus Torvalds wrote:
> On Sun, Sep 15, 2019 at 8:23 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > But not blocking is *precisely* what lead us to weak keys in network
> > devices that were sold by the millions to users in their printers,
> > wifi routers, etc.
> 
> Ted, just admit that you are wrong on this, instead of writing the
> above kind of bad fantasy.
> 
> We have *always* supported blocking. It's called "/dev/random". And
> guess what? Not blocking wasn't what lead to weak keys like you try to
> imply.
> 
> What led to weak keys is that /dev/random is useless and nobody sane
> uses it, exactly because it always blocks.

How /dev/random blocks is very different from how getrandom(2) blocks.
Getrandom(2) blocks until the CRNG, and then it never blocks again.
/dev/random tries to do entropy accounting, and it blocks randomly all
the time.  *That* is why it is useless.  I agree that /dev/random is
bad, but I think you're taking the wrong message from it.  It's not
that blocking is always bad; it's that insisting on entropy accounting
and "true randomness" is bad.

The getrandom(2) system call is modelled after *BSD's getentropy(2)
call, and the fact that everyone is using is because for most use
cases, it really is the right way to go.

I think that's the core of my disagreement with you.  I agree that
what /dev/random does is wrong, and to date, we've kept it for
backwards compatibility reasons.  Some of these reasons could be
rational, or at least debated.  For example, GPG wants to use
/dev/random because it thinks it's more secure, and if they are
generating 4096 bit RSA keys, or something else which might be
"post-quantuum cryptography", it's possible that /dev/random is going
to be better than the CRNG for the hyper-paranoid.  Other use cases,
such as some PCI compliance labs who think that getrandom(2) is not
sufficiently secure, are just purely insane --- but that's assuming
today's getrandom(2) is guaranteed to return cryptographically strong
results, or nothing at all.

If we change the existing behavior of getrandom(2) with the default
flags to mean, "we return whatever we feel like, and this includes
something which looks random, but might be trivially reverse
engineered by a research engineer", that is in my mind, a Really Bad
Thing To Do.  And no, a big fat warning isn't sufficient, because
there will be some systems integrators and application programmers who
will ignore the kernel warning message.  They might not even look at
dmesg, and a system console might not exist.

					- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:00                                       ` Theodore Y. Ts'o
@ 2019-09-16 17:07                                         ` Linus Torvalds
  0 siblings, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16 17:07 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Lennart Poettering, Alexander E. Patrakov, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 10:00 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> How /dev/random blocks is very different from how getrandom(2) blocks.
> Getrandom(2) blocks until the CRNG, and then it never blocks again.

Yes and no.

getrandom() very much blocks exactly like /dev/random, when you give
it the GRND_RANDOM flag.

Which is completely broken, and was already known to be broken. So
that flag is just plain stupid.

And getrandom() does *not* block like /dev/urandom does (ie not at
all), which was actually useful, and very widely used.

So you really have the worst of both worlds.

Yes, getrandom(0) does what /dev/random _should_ have done, and what
getrandom(GRND_RANDOM) should be but isn't.

But by making the choice it did, we now have three useless flag
combinations, and we lack one people _want_ and need.

And this design mistake very much caused the particular bug we are now hitting.

                  Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 16:17                                           ` Linus Torvalds
@ 2019-09-16 17:21                                             ` Theodore Y. Ts'o
  2019-09-16 17:44                                               ` Linus Torvalds
                                                                 ` (3 more replies)
  0 siblings, 4 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-16 17:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Vito Caputo, Ahmed S. Darwish, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> So the semantics that getrandom() should have had are:
> 
>  getrandom(0) - just give me reasonable random numbers for any of a
> million non-strict-long-term-security use (ie the old urandom)
> 
>     - the nonblocking flag makes no sense here and would be a no-op

That change is what I consider highly problematic.  There are a *huge*
number of applications which use cryptography which assumes that
getrandom(0) means, "I'm guaranteed to get something safe
cryptographic use".  Changing his now would expose a very large number
of applications to be insecure.  Part of the problem here is that
there are many different actors.  There is the application or
cryptographic library developer, who may want to be sure they have
cryptographically secure random numbers.  They are the ones who will
select getrandom(0).

Then you have the distribution or consumer-grade electronics
developers who may choose to run them too early in some init script or
systemd unit files.  And some of these people may do something stupid,
like run things too early, or omit the a hardware random number
generator in their design, even though it's for a security critical
purpose (say, a digital wallet for bitcoin).  Because some of these
people might do something stupid, one argument (not mine) is that we
must therefore not let getrandom() block.  But doing this penalizes
the security of all the users of the application, not just the stupid
ones.

>  getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
> blocking until entropy pool fills (but not the completely invalid
> entropy decrease accounting)
> 
>     - the nonblocking flag is useful for bootup and for "I will
> actually try to generate entropy".
> 
> and both of those are very very sensible actions. That would actually
> have _fixed_ the problems we had with /dev/[u]random, both from a
> performance standpoint and for a filesystem access standpoint.
> 
> But that is sadly not what we have right now.
> 
> And I suspect we can't fix it, since people have grown to depend on
> the old behavior, and already know to avoid GRND_RANDOM because it's
> useless with old kernels even if we fixed it with new ones.

I don't think we can fix it, because it's the changing of
getrandom(0)'s behavior which is the problem, not GRND_RANDOM.  People
*expect* getrandom(0) to always return secure results.  I don't think
we can make it sometimes return not-necessarily secure results
depending on when the systems integrator or distribution decides to
run the application, and depending on the hardware platform (yes,
traditional x86 systems are probably fine, and fortunately x86
embedded CPU are too expensive and have lousy power management, so no
one really uses x86 for embedded yet, despite Intel's best efforts).
That would just be a purely irresponsible thing to do, IMO.

> Does anybody really seriously debate the above? Ted? Are you seriously
> trying to claim that the existing GRND_RANDOM has any sensible use?
> Are you seriously trying to claim that the fact that we don't have a
> sane urandom source is a "feature"?

There are people who can debate that GRND_RANDOM has any sensible use
cases.  GPG uses /dev/random, and that was a fully informed choice.
I'm not convinced, because I think that at least for now the CRNG is
perfectly fine for 99.999% of the use cases.  Yes, in a post-quantum
cryptography world, the CRNG might be screwed --- but so will most of
the other cryptographic algorithms in the kernel.  So if anyone ever
gets post-quantum cryptoanalytic attacks working, the use of the CRNG
is going to be least of our problems.

As I mentioned to you in Lisbon, I've been going back and forth about
whether or not to rip out the entire /dev/random infrastructure,
mainly for code maintainability reasons.  The only reason why I've
been holding back is because there are (very few) non-insane people
who do want to use it.  There are also a much larger of rational
people who use it because they want some insane PCI compliance labs to
go away.  What I suspect most of them are actually doing in practice
is they use /dev/random, but they also use a hardware random number
generator so /dev/random never actually blocks in practice.  The use
of /dev/random is enough to make the PCI compliance lab go away, and
the hardware random number generator (or virtio-rng on a VM) makes
/dev/random useable.

But I don't think we can reuse GRND_RANDOM for that reason.

We could create a new flag, GRND_INSECURE, which never blocks.  And
that that allows us to solve the problem for silly applications that
are using getrandom(2) for non-cryptographic use cases.  Use cases
might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
generation where best efforts probably is good enough, etc.  The
answer today is they should just use /dev/urandom, since that exists
today, and we have to support it for backwards compatibility anyway.
It sounds like gdm recently switched to getrandom(2), and I suspect
that it's going to get caught on some hardware configs anyway, even
without the ext4 optimization patch.  So I suspect gdm will switch
back to /dev/urandom, and this particular pain point will probably go
away.

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:21                                             ` Theodore Y. Ts'o
@ 2019-09-16 17:44                                               ` Linus Torvalds
  2019-09-16 17:55                                                 ` Serge Belyshev
                                                                   ` (2 more replies)
  2019-09-16 18:00                                               ` Linux 5.3-rc8 Alexander E. Patrakov
                                                                 ` (2 subsequent siblings)
  3 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16 17:44 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Willy Tarreau, Vito Caputo, Ahmed S. Darwish, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 10:21 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> We could create a new flag, GRND_INSECURE, which never blocks.  And
> that that allows us to solve the problem for silly applications that
> are using getrandom(2) for non-cryptographic use cases.

Note that getting "reasonably good random numbers" is definitely not silly.

If you are doing things like just shuffling a deck of cards for
playing solitaire on your computer, getting a good enough source of
randomness is nontrivial. Using getrandom() for that is a _very_ valid
use. But it obviously does not need _secure_ random numbers.

It is, in fact, _so_ random that we give that AT_RANDOM thing to every
new process because people want things like that. Sadly, people often
aren't aware of it, and don't use that as much as they could.

(Btw, we should probably also mix in other per-process state, because
right now people have actually attacked the boot-time AT_RANDOM to
find canary data etc).

So I think you are completely out to lunch by calling these "insecure"
things "silly". They are very very common. *WAY* more common than
making a long-term secure key will ever be. There's just a lot of use
of reasonable randomness.

You also are ignoring that we have an existing problem with existing
applications. That happened exactly because those things are so
common.

So here's my suggestion:

 - admit that the current situation actually causes problems, and has
_existing_ bugs.

 - throw it out the window, with the timeout and big BIG warning when
the problem cases trigger

 - add new GRND_SECURE and GRND_INSECURE flags that have the actual
useful behaviors that we currently pretty much lack

 - consider the old 0-3 flag values legacy, deprecated, and unsafe
because they _will_ time out to fix the existing problem we have right
now because of their bad behavior.

And stop with the "insecure is silly". Insecure is not silly, and in
fact should have been the default, because

 (a) insecure is and basically always will be the common case by far

 (b) insecure is the "not thinking about it" case and should thus be default

and that (b) is also very much why 0 should have been that insecure case.

Part of the problem is exactly the whole "_normally_ it just works, so
using 0 without thinking about it tests out well".

Which is why getrandom(0) is the main problem we face.

Because defaults matter.

               Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:44                                               ` Linus Torvalds
@ 2019-09-16 17:55                                                 ` Serge Belyshev
  2019-09-16 19:08                                                 ` Willy Tarreau
  2019-09-16 23:02                                                 ` Matthew Garrett
  2 siblings, 0 replies; 211+ messages in thread
From: Serge Belyshev @ 2019-09-16 17:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Ahmed S. Darwish, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml


>  - add new GRND_SECURE and GRND_INSECURE flags that have the actual
> useful behaviors that we currently pretty much lack
>
>  - consider the old 0-3 flag values legacy, deprecated, and unsafe
> because they _will_ time out to fix the existing problem we have right
> now because of their bad behavior.

Just for the record because I did not see it mentioned in this thread,
this patch by Andy Lutomirski, posted two weeks ago, adds GRND_INSECURE
and makes GRND_RANDOM a no-op:

https://lore.kernel.org/lkml/cover.1567126741.git.luto@kernel.org/

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:21                                             ` Theodore Y. Ts'o
  2019-09-16 17:44                                               ` Linus Torvalds
@ 2019-09-16 18:00                                               ` Alexander E. Patrakov
  2019-09-16 19:53                                               ` Ahmed S. Darwish
  2019-09-17 15:32                                               ` Lennart Poettering
  3 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-16 18:00 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Linus Torvalds
  Cc: Willy Tarreau, Vito Caputo, Ahmed S. Darwish, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 8154 bytes --]

16.09.2019 22:21, Theodore Y. Ts'o пишет:
> On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
>> So the semantics that getrandom() should have had are:
>>
>>   getrandom(0) - just give me reasonable random numbers for any of a
>> million non-strict-long-term-security use (ie the old urandom)
>>
>>      - the nonblocking flag makes no sense here and would be a no-op
> 
> That change is what I consider highly problematic.  There are a *huge*
> number of applications which use cryptography which assumes that
> getrandom(0) means, "I'm guaranteed to get something safe
> cryptographic use".  Changing his now would expose a very large number
> of applications to be insecure.  Part of the problem here is that
> there are many different actors.  There is the application or
> cryptographic library developer, who may want to be sure they have
> cryptographically secure random numbers.  They are the ones who will
> select getrandom(0).
> 
> Then you have the distribution or consumer-grade electronics
> developers who may choose to run them too early in some init script or
> systemd unit files.  And some of these people may do something stupid,
> like run things too early, or omit the a hardware random number
> generator in their design, even though it's for a security critical
> purpose (say, a digital wallet for bitcoin).  Because some of these
> people might do something stupid, one argument (not mine) is that we
> must therefore not let getrandom() block.  But doing this penalizes
> the security of all the users of the application, not just the stupid
> ones.

On Linux, there is no such thing as "too early", that's the problem.

First, we already had one lesson about this, regarding applications that 
require libraries from /usr. There, it was due to various programs that 
run from udev rules, and dynamic/unpredictable dependencies. See 
https://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken/, 
almost all arguments from there apply 1:1 here.

Second, people/distributions put unexpected stuff into their initramfs 
images, and we cannot say that they have no right to do so. E.g., on my 
system that's "cryptsetup" that unlocks the root partition, but manages 
to read a few bytes of uninitialized urandom before that. A warning here 
is almost unavoidable, and thus will be treated as SPAM.

No such considerations apply to OpenBSD (initramfs does not exist, and 
there is no equivalent of udev that reacts to cold-plug events by 
running programs), that's why the getentropy() design works there.

If we were to fix it, we should focus on making true entropy available 
unconditionally, even before /init in the initramfs starts, and warn not 
on the first access to urandom, but on the exec of /init. Look - 
distributions are already running "haveged" which harvests entropy from 
clock jitter. And they still manage to do it wrong (regardless whether 
the "haveged" idea is wrong by itself), by running it too late (at least 
I don't know any kind of stock initramfs with either it or rngd 
included). So it's too complex, and needs to be simplified.

The kernel already has jitterentropy-rng, it uses the same idea as 
"haveged", but, alas, it is exposed as a crypto rng algorithm, not a 
hwrng. And I think it is a bug: cryptoapi rng algorithms are for things 
that get a seed and generate random numbers by rehashing it over and 
over, while jitterentropy-rng requires no seed. Would a patch be 
accepted to convert it to hwrng? (this is essentially the reverse of 
what commit c46ea13 did for exynos-rng)

> 
>>   getrandom(GRND_RANDOM) - get me actual _secure_ random numbers with
>> blocking until entropy pool fills (but not the completely invalid
>> entropy decrease accounting)
>>
>>      - the nonblocking flag is useful for bootup and for "I will
>> actually try to generate entropy".
>>
>> and both of those are very very sensible actions. That would actually
>> have _fixed_ the problems we had with /dev/[u]random, both from a
>> performance standpoint and for a filesystem access standpoint.
>>
>> But that is sadly not what we have right now.
>>
>> And I suspect we can't fix it, since people have grown to depend on
>> the old behavior, and already know to avoid GRND_RANDOM because it's
>> useless with old kernels even if we fixed it with new ones.
> 
> I don't think we can fix it, because it's the changing of
> getrandom(0)'s behavior which is the problem, not GRND_RANDOM.  People
> *expect* getrandom(0) to always return secure results.  I don't think
> we can make it sometimes return not-necessarily secure results
> depending on when the systems integrator or distribution decides to
> run the application, and depending on the hardware platform (yes,
> traditional x86 systems are probably fine, and fortunately x86
> embedded CPU are too expensive and have lousy power management, so no
> one really uses x86 for embedded yet, despite Intel's best efforts).
> That would just be a purely irresponsible thing to do, IMO.
> 
>> Does anybody really seriously debate the above? Ted? Are you seriously
>> trying to claim that the existing GRND_RANDOM has any sensible use?
>> Are you seriously trying to claim that the fact that we don't have a
>> sane urandom source is a "feature"?
> 
> There are people who can debate that GRND_RANDOM has any sensible use
> cases.  GPG uses /dev/random, and that was a fully informed choice.
> I'm not convinced, because I think that at least for now the CRNG is
> perfectly fine for 99.999% of the use cases.  Yes, in a post-quantum
> cryptography world, the CRNG might be screwed --- but so will most of
> the other cryptographic algorithms in the kernel.  So if anyone ever
> gets post-quantum cryptoanalytic attacks working, the use of the CRNG
> is going to be least of our problems.
> 
> As I mentioned to you in Lisbon, I've been going back and forth about
> whether or not to rip out the entire /dev/random infrastructure,
> mainly for code maintainability reasons.  The only reason why I've
> been holding back is because there are (very few) non-insane people
> who do want to use it.  There are also a much larger of rational
> people who use it because they want some insane PCI compliance labs to
> go away.  What I suspect most of them are actually doing in practice
> is they use /dev/random, but they also use a hardware random number
> generator so /dev/random never actually blocks in practice.  The use
> of /dev/random is enough to make the PCI compliance lab go away, and
> the hardware random number generator (or virtio-rng on a VM) makes
> /dev/random useable.

Please don't forget about people who run Linux on Hyper-V, not on KVM, 
and thus have no access to virtio-rng ;)

> 
> But I don't think we can reuse GRND_RANDOM for that reason.
> 
> We could create a new flag, GRND_INSECURE, which never blocks.  And
> that that allows us to solve the problem for silly applications that
> are using getrandom(2) for non-cryptographic use cases.  Use cases
> might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
> generation where best efforts probably is good enough, etc.  The
> answer today is they should just use /dev/urandom, since that exists
> today, and we have to support it for backwards compatibility anyway.
> It sounds like gdm recently switched to getrandom(2), and I suspect
> that it's going to get caught on some hardware configs anyway, even
> without the ext4 optimization patch.  So I suspect gdm will switch
> back to /dev/urandom, and this particular pain point will probably go
> away.
> 
> 						- Ted
> 

Well, at this point, I see that there is a lot of disagreement about how 
getrandom() should behave, aggravated by the baggage of existing 
applications and libraries with contradictory requirements regarding 
getrandom(0) (so not really solvable). I am almost convinced that we 
might want to return -ENOSYS unconditionally, and create a different 
system call with sane flags.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
  2019-09-15 18:32                               ` Willy Tarreau
@ 2019-09-16 18:08                               ` Lennart Poettering
  2019-09-16 19:16                                 ` Willy Tarreau
  2019-09-18 21:15                               ` [PATCH RFC v4 0/1] random: WARN on large getrandom() waits and introduce getrandom2() Ahmed S. Darwish
  2 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-16 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Alexander E. Patrakov, Ahmed S. Darwish,
	Michael Kerrisk, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On So, 15.09.19 10:32, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> [ Added Lennart, who was active in the other thread ]
>
> On Sat, Sep 14, 2019 at 10:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > Thus, add an optional configuration option which stops getrandom(2)
> > from blocking, but instead returns "best efforts" randomness, which
> > might not be random or secure at all.
>
> So I hate having a config option for something like this.
>
> How about this attached patch instead? It only changes the waiting
> logic, and I'll quote the comment in full, because I think that
> explains not only the rationale, it explains every part of the patch
> (and is most of the patch anyway):
>
>  * We refuse to wait very long for a blocking getrandom().
>  *
>  * The crng may not be ready during boot, but if you ask for
>  * blocking random numbers very early, there is no guarantee
>  * that you'll ever get any timely entropy.
>  *
>  * If you are sure you need entropy and that you can generate
>  * it, you need to ask for non-blocking random state, and then
>  * if that fails you must actively _do_something_ that causes
>  * enough system activity, perhaps asking the user to type
>  * something on the keyboard.

You are requesting a UI change here. Maybe the kernel shouldn't be the
one figuring out UI.

I mean, as I understand you are unhappy with behaviour you saw on
systemd systems; we can certainly improve behaviour of systemd in
userspace alone, i.e. abort the getrandom() after a while in userspace
and log about it using typical userspace logging to the console. I am
not sure why you want to do all that in the kernel, the kernel isn't
great at user interaction, and really shouldn't be.

If all you want is abort the getrandom() after 30s and a friendly
message on screen, by all means, let's add that to systemd, I have
zero problem with that. systemd has infrastructure for pushing that to
the user, the kernel doesn't really have that so nicely.

It appears to me you subscribe too much to an idea that userspace
people are not smart enough and couldn't implement something like
this. Turns out we can though, and there's no need to add logic that
appears to follow the logic of "never trust userspace"...

i.e. why not just consider this all just a feature request for the
systemd-random-seed.service, i.e. the service you saw the issue with
to handle this on its own?

> Hmm? No strange behavior. No odd config variables. A bounded total
> boot-time wait of 30s (which is a completely random number, but I
> claimed it as the "big red button" time).

As mentioned, in systemd's case, updating the random seed on disk
is entirely fine to take 5h or so. I don't really think we really need
to bound this in kernel space.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:44                                               ` Linus Torvalds
  2019-09-16 17:55                                                 ` Serge Belyshev
@ 2019-09-16 19:08                                                 ` Willy Tarreau
  2019-09-16 23:02                                                 ` Matthew Garrett
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-16 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Vito Caputo, Ahmed S. Darwish,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Mon, Sep 16, 2019 at 10:44:31AM -0700, Linus Torvalds wrote:
>  - add new GRND_SECURE and GRND_INSECURE flags that have the actual
> useful behaviors that we currently pretty much lack
> 
>  - consider the old 0-3 flag values legacy, deprecated, and unsafe
> because they _will_ time out to fix the existing problem we have right
> now because of their bad behavior.

I think we can keep a flag to work like the current /dev/random and
deplete entropy for the very rare cases where it's really desired
to run this way (maybe even just for research), but it should require
special permissions as it impacts the whole system.

I think that your GRND_SECURE above means the current 0 situation,
where we wait for initial entropy then not wait anymore, right ? If
so it could remain the default setting, because at least it will not
betray applications which rely on this reliability. And GRND_INSECURE
will be decided on a case by case basis by applications that are caught
waiting like sfdisk in initramfs or a MAC address generator for example.
In this case it could even be called GRND_PREDICTABLE maybe to enforce
its property compared to others.

My guess is that we can fix the situation because nobody likes the
problems that sporadically hit users. getrandom() was adopted quite
quickly to solve issues related to using /dev/*random in chroots,
I think the new flags will be adopted by those experiencing issues.

Just my two cents,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized
  2019-09-16 18:08                               ` Lennart Poettering
@ 2019-09-16 19:16                                 ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-16 19:16 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Linus Torvalds, Theodore Y. Ts'o, Alexander E. Patrakov,
	Ahmed S. Darwish, Michael Kerrisk, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 08:08:01PM +0200, Lennart Poettering wrote:
> I mean, as I understand you are unhappy with behaviour you saw on
> systemd systems; we can certainly improve behaviour of systemd in
> userspace alone, i.e. abort the getrandom() after a while in userspace
> and log about it using typical userspace logging to the console. I am
> not sure why you want to do all that in the kernel, the kernel isn't
> great at user interaction, and really shouldn't be.

Because the syscall will have the option to return what random data
was available in this case, while if you try to fix it only from
within systemd you currently don't even get that data.

> It appears to me you subscribe too much to an idea that userspace
> people are not smart enough and couldn't implement something like
> this. Turns out we can though, and there's no need to add logic that
> appears to follow the logic of "never trust userspace"...

I personally see this very differently. If randoms were placed into a
kernel compared to other operating systems doing everything in userspace,
it's in part because it requires to collect data very widely to gather
some entropy and that no isolated userspace alone can collect as much
as the kernel. Or they each have to reimplement their own method, each
with their own bugs, instead of fixing them all at a single place. All
applications need random, there's no reason for having to force them
all to implement them in detail.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:21                                             ` Theodore Y. Ts'o
  2019-09-16 17:44                                               ` Linus Torvalds
  2019-09-16 18:00                                               ` Linux 5.3-rc8 Alexander E. Patrakov
@ 2019-09-16 19:53                                               ` Ahmed S. Darwish
  2019-09-17 15:32                                               ` Lennart Poettering
  3 siblings, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-16 19:53 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Willy Tarreau, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Mon, Sep 16, 2019 at 01:21:17PM -0400, Theodore Y. Ts'o wrote:
> On Mon, Sep 16, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> > So the semantics that getrandom() should have had are:
> > 
> >  getrandom(0) - just give me reasonable random numbers for any of a
> > million non-strict-long-term-security use (ie the old urandom)
> > 
> >     - the nonblocking flag makes no sense here and would be a no-op
> 
> That change is what I consider highly problematic.  There are a *huge*
> number of applications which use cryptography which assumes that
> getrandom(0) means, "I'm guaranteed to get something safe
> cryptographic use".  Changing his now would expose a very large number
> of applications to be insecure.  Part of the problem here is that
> there are many different actors.  There is the application or
> cryptographic library developer, who may want to be sure they have
> cryptographically secure random numbers.  They are the ones who will
> select getrandom(0).
> 
> Then you have the distribution or consumer-grade electronics
> developers who may choose to run them too early in some init script or
> systemd unit files.  And some of these people may do something stupid,
> like run things too early, or omit the a hardware random number
> generator in their design, even though it's for a security critical
> purpose (say, a digital wallet for bitcoin).

Ted, you're really the expert here. My apologies though, every time I
see the words "too early" I get a cramp... Please check my earlier
reply:

    https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc

Specifically the trace_printk log of all the getrandom(2) calls
during an standard Archlinux boot...

where is the "too early" boundary there? It's undefinable.

You either have entropy, or you don't. And if you don't, it will stay
like this forever, because if you had, you wouldn't have blocked in
the first place...

Thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:44                                               ` Linus Torvalds
  2019-09-16 17:55                                                 ` Serge Belyshev
  2019-09-16 19:08                                                 ` Willy Tarreau
@ 2019-09-16 23:02                                                 ` Matthew Garrett
  2019-09-16 23:05                                                   ` Linus Torvalds
  2 siblings, 1 reply; 211+ messages in thread
From: Matthew Garrett @ 2019-09-16 23:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 10:44:31AM -0700, Linus Torvalds wrote:
>  - admit that the current situation actually causes problems, and has
> _existing_ bugs.
> 
>  - throw it out the window, with the timeout and big BIG warning when
> the problem cases trigger

The semantics many people want for secure key generation is urandom, but 
with a guarantee that it's seeded. getrandom()'s default behaviour at 
present provides that, and as a result it's used for a bunch of key 
generation. Changing the default (even with kernel warnings) seems like 
it risks people generating keys from an unseeded prng, and that seems 
like a bad thing?

It's definitely unfortunate that getrandom() doesn't have a GRND_URANDOM 
flag that would make it useful for the "I want some vaguely random 
numbers but I don't care that much and I don't necessarily have access 
to /dev/urandom" case, but at the moment we have no way of 
distinguishing between applications that are making this call because 
they want the semantics of urandom but need it to be seeded (which is 
one of the usecases getrandom() was introduced for in the first place) 
and applications that are making this call because it was convenient and 
the kernel usually ended up generating enough entropy in the first 
place. Given the ambiguity, I don't see an easy way to solve for the 
latter without breaking the former - and that could have some *very* bad 
outcomes.
 
-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:02                                                 ` Matthew Garrett
@ 2019-09-16 23:05                                                   ` Linus Torvalds
  2019-09-16 23:11                                                     ` Matthew Garrett
  2019-09-17  7:15                                                     ` a sane approach to random numbers (was: Re: Linux 5.3-rc8) Martin Steigerwald
  0 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16 23:05 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 4:02 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> The semantics many people want for secure key generation is urandom, but
> with a guarantee that it's seeded.

And that is exactly what I'd suggest GRND_SECURE should do.

The problem with:

> getrandom()'s default behaviour at present provides that

is that exactly because it's the "default" (ie when you don't pass any
flags at all), that behavior is what all the random people get who do
*not* really intentionally want it, they just don't think about it.

> Changing the default (even with kernel warnings) seems like
> it risks people generating keys from an unseeded prng, and that seems
> like a bad thing?

I agree that it's a horrible thing, but the fact that the default 0
behavior had that "wait for entropy" is what now causes boot problems
for people.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:05                                                   ` Linus Torvalds
@ 2019-09-16 23:11                                                     ` Matthew Garrett
  2019-09-16 23:13                                                       ` Alexander E. Patrakov
  2019-09-16 23:18                                                       ` Linus Torvalds
  2019-09-17  7:15                                                     ` a sane approach to random numbers (was: Re: Linux 5.3-rc8) Martin Steigerwald
  1 sibling, 2 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-16 23:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 04:05:47PM -0700, Linus Torvalds wrote:
> On Mon, Sep 16, 2019 at 4:02 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> > Changing the default (even with kernel warnings) seems like
> > it risks people generating keys from an unseeded prng, and that seems
> > like a bad thing?
> 
> I agree that it's a horrible thing, but the fact that the default 0
> behavior had that "wait for entropy" is what now causes boot problems
> for people.

In one case we have "Systems don't boot, but you can downgrade your 
kernel" and in the other case we have "Your cryptographic keys are weak 
and you have no way of knowing unless you read dmesg", and I think 
causing boot problems is the better outcome here.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:11                                                     ` Matthew Garrett
@ 2019-09-16 23:13                                                       ` Alexander E. Patrakov
  2019-09-16 23:15                                                         ` Matthew Garrett
  2019-09-16 23:18                                                       ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-16 23:13 UTC (permalink / raw)
  To: Matthew Garrett, Linus Torvalds
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

17.09.2019 04:11, Matthew Garrett пишет:
> In one case we have "Systems don't boot, but you can downgrade your
> kernel"

You can't. There are way too many dedicated server providers where there 
is no IPMI or any equivalent, and the only help that the staff can do is 
to reinstall, wiping your data.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:13                                                       ` Alexander E. Patrakov
@ 2019-09-16 23:15                                                         ` Matthew Garrett
  0 siblings, 0 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-16 23:15 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Linus Torvalds, Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 04:13:36AM +0500, Alexander E. Patrakov wrote:
> 17.09.2019 04:11, Matthew Garrett пишет:
> > In one case we have "Systems don't boot, but you can downgrade your
> > kernel"
> 
> You can't. There are way too many dedicated server providers where there is
> no IPMI or any equivalent, and the only help that the staff can do is to
> reinstall, wiping your data.

In which case you're presumably running a distro kernel that's had a 
decent amount of testing before you upgrade to it?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:11                                                     ` Matthew Garrett
  2019-09-16 23:13                                                       ` Alexander E. Patrakov
@ 2019-09-16 23:18                                                       ` Linus Torvalds
  2019-09-16 23:29                                                         ` Ahmed S. Darwish
                                                                           ` (2 more replies)
  1 sibling, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-16 23:18 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 4:11 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> In one case we have "Systems don't boot, but you can downgrade your
> kernel" and in the other case we have "Your cryptographic keys are weak
> and you have no way of knowing unless you read dmesg", and I think
> causing boot problems is the better outcome here.

Or: In one case you have a real and present problem. In the other
case, people are talking hypotheticals.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:18                                                       ` Linus Torvalds
@ 2019-09-16 23:29                                                         ` Ahmed S. Darwish
  2019-09-17  1:05                                                           ` Linus Torvalds
  2019-09-17  0:03                                                         ` Matthew Garrett
  2019-09-17  0:40                                                         ` Matthew Garrett
  2 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-16 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Garrett, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 04:18:00PM -0700, Linus Torvalds wrote:
> On Mon, Sep 16, 2019 at 4:11 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
> >
> > In one case we have "Systems don't boot, but you can downgrade your
> > kernel" and in the other case we have "Your cryptographic keys are weak
> > and you have no way of knowing unless you read dmesg", and I think
> > causing boot problems is the better outcome here.
> 
> Or: In one case you have a real and present problem. In the other
> case, people are talking hypotheticals.
>

Linus, in all honesty, the other case is _not_ a hypothetical . For
example, here is a fresh comment on LWN from gnupg developers:

    https://lwn.net/Articles/799352

It's about this libgnupg code:

    => https://dev.gnupg.org/source/libgcrypt.git

    => random/rdlinux.c:
    
    /* If we have a modern operating system, we first try to use the new
     * getentropy function.  That call guarantees that the kernel's
     * RNG has been properly seeded before returning any data.  This
     * is different from /dev/urandom which may, due to its
     * non-blocking semantics, return data even if the kernel has
     * not been properly seeded.  And it differs from /dev/random by never
     * blocking once the kernel is seeded.  */
    #if defined(HAVE_GETENTROPY) || defined(__NR_getrandom)
    do {
        ...
        ret = getentropy (buffer, nbytes);
        ...
    } while (ret == -1 && errno == EINTR);

thanks,

-- 
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:18                                                       ` Linus Torvalds
  2019-09-16 23:29                                                         ` Ahmed S. Darwish
@ 2019-09-17  0:03                                                         ` Matthew Garrett
  2019-09-17  0:40                                                         ` Matthew Garrett
  2 siblings, 0 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On 16 September 2019 16:18:00 GMT-07:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, Sep 16, 2019 at 4:11 PM Matthew Garrett <mjg59@srcf.ucam.org>
>wrote:
>>
>> In one case we have "Systems don't boot, but you can downgrade your
>> kernel" and in the other case we have "Your cryptographic keys are
>weak
>> and you have no way of knowing unless you read dmesg", and I think
>> causing boot problems is the better outcome here.
>
>Or: In one case you have a real and present problem. In the other
>case, people are talking hypotheticals.

We've been recommending that people use getrandom() for key generation since it was first added to the kernel. Github suggests there are users in the wild - there's almost certainly more cases where internal code depends on the existing semantics.


-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:18                                                       ` Linus Torvalds
  2019-09-16 23:29                                                         ` Ahmed S. Darwish
  2019-09-17  0:03                                                         ` Matthew Garrett
@ 2019-09-17  0:40                                                         ` Matthew Garrett
  2 siblings, 0 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17  0:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Ahmed S. Darwish, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On 16 September 2019 16:18:00 GMT-07:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, Sep 16, 2019 at 4:11 PM Matthew Garrett <mjg59@srcf.ucam.org>
>wrote:
>>
>> In one case we have "Systems don't boot, but you can downgrade your
>> kernel" and in the other case we have "Your cryptographic keys are
>weak
>> and you have no way of knowing unless you read dmesg", and I think
>> causing boot problems is the better outcome here.
>
>Or: In one case you have a real and present problem. In the other
>case, people are talking hypotheticals.

(resending because accidental HTML, sorry about that) 

We've been recommending that people use the default getrandom() behaviour for key generation since it was merged. Github shows users, and it's likely there's cases in internal code as well. 


-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 23:29                                                         ` Ahmed S. Darwish
@ 2019-09-17  1:05                                                           ` Linus Torvalds
  2019-09-17  1:23                                                             ` Matthew Garrett
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17  1:05 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Matthew Garrett, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 4:29 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> Linus, in all honesty, the other case is _not_ a hypothetical .

Oh yes it is.

You're confusing "use" with "breakage".

The _use_ of getrandom(0) for key generation isn't hypothetical.

But the _breakage_ from the suggested patch that makes it time out is.

See the difference?

The thing is, to break, you have to

 (a) do that key generation at boot time

 (b) do it on an idle machine that doesn't have entropy

in order to basically reproduce the current boot-time hang situation
with the broken gdm, except with an actual "generate key".

Then you have to ignore the big warning too.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  1:05                                                           ` Linus Torvalds
@ 2019-09-17  1:23                                                             ` Matthew Garrett
  2019-09-17  1:41                                                               ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17  1:23 UTC (permalink / raw)
  To: Linus Torvalds, Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Willy Tarreau, Vito Caputo,
	Lennart Poettering, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On 16 September 2019 18:05:57 GMT-07:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, Sep 16, 2019 at 4:29 PM Ahmed S. Darwish <darwish.07@gmail.com>
>wrote:
>>
>> Linus, in all honesty, the other case is _not_ a hypothetical .
>
>Oh yes it is.
>
>You're confusing "use" with "breakage".
>
>The _use_ of getrandom(0) for key generation isn't hypothetical.
>
>But the _breakage_ from the suggested patch that makes it time out is.
>
>See the difference?
>
>The thing is, to break, you have to
>
> (a) do that key generation at boot time
>
> (b) do it on an idle machine that doesn't have entropy

Exactly the scenario where you want getrandom() to block, yes. 

>in order to basically reproduce the current boot-time hang situation
>with the broken gdm, except with an actual "generate key".
>
>Then you have to ignore the big warning too.

The big warning that's only printed in dmesg? 


-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  1:23                                                             ` Matthew Garrett
@ 2019-09-17  1:41                                                               ` Linus Torvalds
  2019-09-17  1:46                                                                 ` Matthew Garrett
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17  1:41 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 6:24 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> Exactly the scenario where you want getrandom() to block, yes.

It *would* block. Just not forever.

And btw, the whole "generate key at boot when nothing else is going
on" is already broken, so presumably nobody actually does it.

See why I'm saying "hypothetical"? You're doing it again.

> >Then you have to ignore the big warning too.
>
> The big warning that's only printed in dmesg?

Well, the patch actually made getrandom() return en error too, but you
seem more interested in the hypotheticals than in arguing actualities.

          Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  1:41                                                               ` Linus Torvalds
@ 2019-09-17  1:46                                                                 ` Matthew Garrett
  2019-09-17  5:24                                                                   ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17  1:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On 16 September 2019 18:41:36 GMT-07:00, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, Sep 16, 2019 at 6:24 PM Matthew Garrett <mjg59@srcf.ucam.org>
>wrote:
>>
>> Exactly the scenario where you want getrandom() to block, yes.
>
>It *would* block. Just not forever.

It's already not forever - there's enough running in the background of that system that it'll unblock eventually. 

>And btw, the whole "generate key at boot when nothing else is going
>on" is already broken, so presumably nobody actually does it.

If nothing ever did this, why was getrandom() designed in a way to protect against this situation? 

>See why I'm saying "hypothetical"? You're doing it again.
>
>> >Then you have to ignore the big warning too.
>>
>> The big warning that's only printed in dmesg?
>
>Well, the patch actually made getrandom() return en error too, but you
>seem more interested in the hypotheticals than in arguing actualities.

If you want to be safe, terminate the process.


-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  1:46                                                                 ` Matthew Garrett
@ 2019-09-17  5:24                                                                   ` Willy Tarreau
  2019-09-17  7:33                                                                     ` Martin Steigerwald
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17  5:24 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Linus Torvalds, Ahmed S. Darwish, Theodore Y. Ts'o,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
> >Well, the patch actually made getrandom() return en error too, but you
> >seem more interested in the hypotheticals than in arguing actualities.
> 
> If you want to be safe, terminate the process.

This is an interesting approach. At least it will cause bug reports in
application using getrandom() in an unreliable way and they will check
for other options. Because one of the issues with systems that do not
finish to boot is that usually the user doesn't know what process is
hanging.

Anyway regarding the impact on applications relying on getrandom() for
security, I'm in favor of not *silently* changing their behavior and
provide a new flag to help others get insecure randoms without waiting.

With your option above we could then have this way to go:

  - GRND_SECURE: the application wants secure randoms, i.e. like
    the current getrandom(0), waiting for entropy.

  - GRND_INSECURE: the application never wants to wait, it just
    wants a replacement for /dev/urandom.

  - GRND_RANDOM: unchanged, or subject to CAP_xxx, or maybe just emit
    a "deprecated" warning if called without a certain capability, to
    spot potentially harmful applications.

  - by default (0), the application continues to wait but when the
    timeout strikes (30 seconds ?), it gets terminated with a
    message in the logs for users to report the issue.

After some time all relevant applications which accidently misuse
getrandom() will be fixed to either use GRND_INSECURE or GRND_SECURE
and be able to wait longer if they want (likely SECURE|NONBLOCK).

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* a sane approach to random numbers (was: Re: Linux 5.3-rc8)
  2019-09-16 23:05                                                   ` Linus Torvalds
  2019-09-16 23:11                                                     ` Matthew Garrett
@ 2019-09-17  7:15                                                     ` Martin Steigerwald
  1 sibling, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17  7:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Garrett, Theodore Y. Ts'o, Willy Tarreau,
	Vito Caputo, Ahmed S. Darwish, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

As this is not about Linux 5.3-rc8 anymore I took the liberty to change 
the subject.

Linus Torvalds - 17.09.19, 01:05:47 CEST:
> On Mon, Sep 16, 2019 at 4:02 PM Matthew Garrett <mjg59@srcf.ucam.org> 
> wrote:
> > The semantics many people want for secure key generation is urandom,
> > but with a guarantee that it's seeded.
> 
> And that is exactly what I'd suggest GRND_SECURE should do.
> 
> The problem with:
> > getrandom()'s default behaviour at present provides that
> 
> is that exactly because it's the "default" (ie when you don't pass any
> flags at all), that behavior is what all the random people get who do
> *not* really intentionally want it, they just don't think about it.
> > Changing the default (even with kernel warnings) seems like
> > it risks people generating keys from an unseeded prng, and that
> > seems
> > like a bad thing?
> 
> I agree that it's a horrible thing, but the fact that the default 0
> behavior had that "wait for entropy" is what now causes boot problems
> for people.

Seeing all the discussion, I just got the impression that it may be best 
to start from scratch. To stop trying to fix something that was broken to 
begin with – at least that was what I got from the discussion here.

Do a sane API with new function names, new flag names and over time 
deprecate the old one completely so that one day it hopefully could be 
gradually disabled until it can be removed. Similar like with statx() 
replacing stat() someday hopefully.

And do some documentation about how it is to be used by userspace 
developers. I.e. like: If the kernel says it is not random, do not block 
and poll on it, but do something to generate entropy.

But maybe that is naive, too.

However, in the end, what ever you kernel developers will come up with, 
I bet there will be no way to make the kernel control userspace 
developers. However I have the impression that that is what you attempt 
to do here. As long as you have an API to obtain guaranteed random 
numbers or at least somewhat guaranteed random numbers that is not 
directly available at boot time, userspace could poll on its 
availability. At least as long as the kernel would be honest about its 
unavailability and tell about it. And if it doesn't applications that 
*require* random numbers can never know whether they got some from the 
kernel.

Maybe you can make an API that is hard to abuse, yes. And that is good. 
But impossible?

I wonder: How could the Linux experience look like if kernel developers 
and userspace developers actually work together instead of finding ways 
to fight each other? I mean, for the most common userspace applications 
in the free software continuum, there would not be all that many people 
to talk with, or would there? It is basically gdm, sddm, some other 
display managers probably, SSH, GnuPG and probably a few more. For 
example for gdm someone could open a bug report about its use of the 
current API and ask it to use something that is non blocking? And does 
Systemd really need to deplete the random pool early at boot in order to 
generate UUIDs? Even tough I do not use GNOME I'd be willing to help 
with doing a few bug reports there and there. AFAIR there has been 
something similar with sddm which I used, but I believe there it has 
been fixed already with sddm.

Sometimes I wonder what would happen if kernel and userspace developers 
actually *talk* to each other, or better *with* each other.

But instead for example with Lennart appears to be afraid to interact 
with the kernel community and some kernel developers just talked about 
personalities that they find difficult to interact it, judging them to be 
like this and like that.

There is a social, soft skill issue here that no amount of technical 
excellence will resolve. That is at least how I observe this.

Does it make it easier? Probably not. I fully appreciate that some 
people may have a difficult time to talk with each other, I experienced 
this myself often enough. I did not report a bug report with Systemd I 
found  recently just cause I do not like to repeat the experience I had 
when I reported bugs about it before and I do not use it anymore 
personally anyway. So I totally get that.

However… not talking with each other is not going to resolve those 
userspace uses kernel API in a way kernel developers do not agree with 
and that causes issues like stalled boots. Cause basically userspace can 
abuse any kernel API and in the end the kernel can do nothing about it.

Of course feel free to ignore this, if you think it is not useful.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  5:24                                                                   ` Willy Tarreau
@ 2019-09-17  7:33                                                                     ` Martin Steigerwald
  2019-09-17  8:35                                                                       ` Willy Tarreau
                                                                                         ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17  7:33 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

Willy Tarreau - 17.09.19, 07:24:38 CEST:
> On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
> > >Well, the patch actually made getrandom() return en error too, but
> > >you seem more interested in the hypotheticals than in arguing
> > >actualities.> 
> > If you want to be safe, terminate the process.
> 
> This is an interesting approach. At least it will cause bug reports in
> application using getrandom() in an unreliable way and they will
> check for other options. Because one of the issues with systems that
> do not finish to boot is that usually the user doesn't know what
> process is hanging.

A userspace process could just poll on the kernel by forking a process 
to use getrandom() and waiting until it does not get terminated anymore. 
And then it would still hang.

So yes, that would it make it harder to abuse the API, but not 
impossible. Which may still be good, I don't know.

Either the kernel does not reveal at all whether it has seeded CRNG and 
leaves GnuPG, OpenSSH and others in the dark, or it does and risk that 
userspace does stupid things whether it prints a big fat warning or not.

Of course the warning could be worded like:

process blocking on entropy too early on boot without giving the kernel 
much chance to gather entropy. this is not a kernel issue, report to 
userspace developers

And probably then kill the process, so at least users will know.

However this again would be burdening users with an issue they should 
not have to care about. Unless userspace developers care enough and 
manage to take time to fix the issue before updated kernels come to their 
systems. Cause again it would be users systems that would not be 
working. Just cause kernel and userspace developers did not agree and 
chose to fight with each other instead of talking *with* each other.

At least with killing gdm Systemd may restart it if configured to do so. 
But if it doesn't, the user is again stuck with a non working system 
until restarting gdm themselves.

It may still make sense to make the API harder to use, but it does not 
replace talking with userspace developers and it would need some time to 
allow for adapting userspace applications and services.

-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  7:33                                                                     ` Martin Steigerwald
@ 2019-09-17  8:35                                                                       ` Willy Tarreau
  2019-09-17  8:44                                                                         ` Martin Steigerwald
  2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
  2019-09-17 16:27                                                                       ` Linus Torvalds
  2 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17  8:35 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
> However this again would be burdening users with an issue they should 
> not have to care about. Unless userspace developers care enough and 
> manage to take time to fix the issue before updated kernels come to their 
> systems. Cause again it would be users systems that would not be 
> working. Just cause kernel and userspace developers did not agree and 
> chose to fight with each other instead of talking *with* each other.

It has nothing to do with fighting at all, it has to do with offering
what applications *need* without breaking existing assumptions that
make most applications work. And more importantly it involves not
silently breaking applications which need good randomness for long
lived keys because the breakage will not be visible initially and can
hit them hard later. Right now most applications which block in the
early stages are only victim of the current situation and their
developers possibly didn't understand the possible impacts of lack
of entropy (or how real an issue it was). These applications do need
to be able to get low-quality random without blocking forever,
provided these are not accidently used by those who need security. At
some point, just like for any syscall, the doc makes the difference.

> At least with killing gdm Systemd may restart it if configured to do so. 
> But if it doesn't, the user is again stuck with a non working system 
> until restarting gdm themselves.
> 
> It may still make sense to make the API harder to use,

No. What is hard to use is often misused. It must be harder to misuse
it, which means it should be easier to correctly use it. The choice of
flag names and the emission of warnings definitely helps during the
development stage.

> but it does not 
> replace talking with userspace developers and it would need some time to 
> allow for adapting userspace applications and services.

Which is how adding new flags can definitely help even if adoption takes
time. By the way in this discussion I am a userspace developer and have
been hit several times by libraries switching to getrandom() that silently
failed to respond in field. As a userspace developer, I really want to see
a solution to this problem. And I'm fine if the kernel decides to kill
haproxy for using getrandom() with the old settings, at least users will
notice, will complain to me and will update.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  8:35                                                                       ` Willy Tarreau
@ 2019-09-17  8:44                                                                         ` Martin Steigerwald
  0 siblings, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17  8:44 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

Willy Tarreau - 17.09.19, 10:35:16 CEST:
> On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
> > However this again would be burdening users with an issue they
> > should
> > not have to care about. Unless userspace developers care enough and
> > manage to take time to fix the issue before updated kernels come to
> > their systems. Cause again it would be users systems that would not
> > be working. Just cause kernel and userspace developers did not
> > agree and chose to fight with each other instead of talking *with*
> > each other.
> It has nothing to do with fighting at all, it has to do with offering
> what applications *need* without breaking existing assumptions that
> make most applications work. And more importantly it involves not
[…]

Well I got the impression or interpretation that it would be about 
fighting… if it is not, all the better!

> > At least with killing gdm Systemd may restart it if configured to do
> > so. But if it doesn't, the user is again stuck with a non working
> > system until restarting gdm themselves.
> > 
> > It may still make sense to make the API harder to use,
> 
> No. What is hard to use is often misused. It must be harder to misuse
> it, which means it should be easier to correctly use it. The choice of
> flag names and the emission of warnings definitely helps during the
> development stage.

Sorry, this was a typo of mine. I actually meant harder to abuse. 
Anything else would not make sense in the context of what I have 
written.

Make it easier to use properly and harder to abuse.

> > but it does not
> > replace talking with userspace developers and it would need some
> > time to allow for adapting userspace applications and services.
> 
> Which is how adding new flags can definitely help even if adoption
> takes time. By the way in this discussion I am a userspace developer
> and have been hit several times by libraries switching to getrandom()
> that silently failed to respond in field. As a userspace developer, I
> really want to see a solution to this problem. And I'm fine if the
> kernel decides to kill haproxy for using getrandom() with the old
> settings, at least users will notice, will complain to me and will
> update.

Good to see that you are also engaging as a userspace developer in the 
discussion.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  7:33                                                                     ` Martin Steigerwald
  2019-09-17  8:35                                                                       ` Willy Tarreau
@ 2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
                                                                                           ` (2 more replies)
  2019-09-17 16:27                                                                       ` Linus Torvalds
  2 siblings, 3 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-17 12:11 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Willy Tarreau, Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
> Willy Tarreau - 17.09.19, 07:24:38 CEST:
> > On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
> > > >Well, the patch actually made getrandom() return en error too, but
> > > >you seem more interested in the hypotheticals than in arguing
> > > >actualities.> 
> > > If you want to be safe, terminate the process.
> > 
> > This is an interesting approach. At least it will cause bug reports in
> > application using getrandom() in an unreliable way and they will
> > check for other options. Because one of the issues with systems that
> > do not finish to boot is that usually the user doesn't know what
> > process is hanging.
> 

I would be happy with a change which changes getrandom(0) to send a
kill -9 to the process if it is called too early, with a new flag,
getrandom(GRND_BLOCK) which blocks until entropy is available.  That
leaves it up to the application developer to decide what behavior they
want.

Userspace applications which want to do something more sophisticated
could set a timer which will cause getrandom(GRND_BLOCK) to return
with EINTR (or the signal handler could use longjmp; whatever) to
abort and do something else, like calling random_r if it's for some
pathetic use of random numbers like MIT-MAGIC-COOKIE.

> A userspace process could just poll on the kernel by forking a process 
> to use getrandom() and waiting until it does not get terminated anymore. 
> And then it would still hang.

So.... I'm not too worried about that, because if a process is
determined to do something stupid, they can always do something
stupid.

This could potentially be a problem, as would GRND_BLOCK, in that if
an application author decides to use to do something to wait for real
randomness, because in the good judgement of the application author,
it d*mned needs real security because otherwise an attacker could,
say, force a launch of nuclear weapons and cause world war III, and
then some small 3rd-tier distro decides to repurpose that application
for some other use, and puts it in early boot, it's possible that a
user will report it as a "regression", and we'll be back to the
question of whether we revert a performance optimization patch.

There are only two ways out of this mess.  The first option is we take
functionality away from a userspace author who Really Wants A Secure
Random Number Generator.  And there are an awful lot of programs who
really want secure crypto, becuase this is not a hypothetical.  The
result in "Mining your P's and Q's" did happen before.  If we forget
the history, we are doomed to repeat it.

The only other way is that we need to try to get the CRNG initialized
securely in early boot, before we let userspace start.  If we do it
early enough, we can also make the kernel facilities like KASLR and
Stack Canaries more secure.  And this is *doable*, at least for most
common platforms.  We can leverage UEFI; we cn try to use the TPM's
random number generator, etc.  It won't help so much for certain
brain-dead architectures, like MIPS and ARM, but if they are used for
embedded use cases, it will be caught before the product is released
for consumer use.  And this is where blocking is *way* better than a
big fat warning, or sleeping for 15 seconds, both of which can easily
get missed in the embedded case.  If we can fix this for traditional
servers/desktops/laptops, then users won't be complaining to Linus,
and I think we can all be happy.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
@ 2019-09-17 12:30                                                                         ` Ahmed S. Darwish
  2019-09-17 12:46                                                                           ` Alexander E. Patrakov
                                                                                             ` (2 more replies)
  2019-09-17 13:11                                                                         ` Alexander E. Patrakov
  2019-09-17 15:57                                                                         ` Lennart Poettering
  2 siblings, 3 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-17 12:30 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Martin Steigerwald, Willy Tarreau, Matthew Garrett,
	Linus Torvalds, Vito Caputo, Lennart Poettering, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 08:11:56AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
> > Willy Tarreau - 17.09.19, 07:24:38 CEST:
> > > On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
> > > > >Well, the patch actually made getrandom() return en error too, but
> > > > >you seem more interested in the hypotheticals than in arguing
> > > > >actualities.>
> > > > If you want to be safe, terminate the process.
> > >
> > > This is an interesting approach. At least it will cause bug reports in
> > > application using getrandom() in an unreliable way and they will
> > > check for other options. Because one of the issues with systems that
> > > do not finish to boot is that usually the user doesn't know what
> > > process is hanging.
> >
>
> I would be happy with a change which changes getrandom(0) to send a
> kill -9 to the process if it is called too early, with a new flag,
> getrandom(GRND_BLOCK) which blocks until entropy is available.  That
> leaves it up to the application developer to decide what behavior they
> want.
>

Yup, I'm convinced that's the sanest option too. I'll send a final RFC
patch tonight implementing the following:

config GETRANDOM_CRNG_ENTROPY_MAX_WAIT_MS
	int
	default 3000
	help
	  Default max wait in milliseconds, for the getrandom(2) system
	  call when asking for entropy from the urandom source, until
	  the Cryptographic Random Number Generator (CRNG) gets
	  initialized.  Any process exceeding this duration for entropy
	  wait will get killed by kernel. The maximum wait can be
	  overriden through the "random.getrandom_max_wait_ms" kernel
	  boot parameter. Rationale follows.

	  When the getrandom(2) system call was created, it came with
	  the clear warning: "Any userspace program which uses this new
	  functionality must take care to assure that if it is used
	  during the boot process, that it will not cause the init
	  scripts or other portions of the system startup to hang
	  indefinitely.

	  Unfortunately, due to multiple factors, including not having
	  this warning written in a scary enough language in the
	  manpages, and due to glibc since v2.25 implementing a BSD-like
	  getentropy(3) in terms of getrandom(2), modern user-space is
	  calling getrandom(2) in the boot path everywhere.

	  Embedded Linux systems were first hit by this, and reports of
	  embedded system "getting stuck at boot" began to be
	  common. Over time, the issue began to even creep into consumer
	  level x86 laptops: mainstream distributions, like Debian
	  Buster, began to recommend installing haveged as a workaround,
	  just to let the system boot.

	  Filesystem optimizations in EXT4 and XFS exagerated the
	  problem, due to aggressive batching of IO requests, and thus
	  minimizing sources of entropy at boot. This led to large
	  delays until the kernel's Cryptographic Random Number
	  Generator (CRNG) got initialized, and thus having reports of
	  getrandom(2) inidifinitely stuck at boot.

	  Solve this problem by setting a conservative upper bound for
	  getrandom(2) wait. Kill the process, instead of returning an
	  error code, because otherwise crypto-sensitive applications
	  may revert to less secure mechanisms (e.g. /dev/urandom). We
	  __deeply encourage__ system integrators and distribution
	  builders not to considerably increase this value: during
	  system boot, you either have entropy, or you don't. And if you
	  didn't have entropy, it will stay like this forever, because
	  if you had, you wouldn't have blocked in the first place. It's
	  an atomic "either/or" situation, with no middle ground. Please
	  think twice.

	  Ideally, systems would be configured with hardware random
	  number generators, and/or configured to trust the CPU-provided
	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
	  should generate cryptographic keys only as late as possible,
	  when they are needed, instead of during early boot.  (For
	  non-cryptographic use cases, such as dictionary seeds or MIT
	  Magic Cookies, other mechanisms such as /dev/urandom or
	  random(3) may be more appropropriate.)

Sounds good?

thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
@ 2019-09-17 12:46                                                                           ` Alexander E. Patrakov
  2019-09-17 12:47                                                                           ` Willy Tarreau
  2019-09-17 16:08                                                                           ` Lennart Poettering
  2 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-17 12:46 UTC (permalink / raw)
  To: Ahmed S. Darwish, Theodore Y. Ts'o
  Cc: Martin Steigerwald, Willy Tarreau, Matthew Garrett,
	Linus Torvalds, Vito Caputo, Lennart Poettering, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	lkml


[-- Attachment #1: Type: text/plain, Size: 5047 bytes --]

17.09.2019 17:30, Ahmed S. Darwish пишет:
> On Tue, Sep 17, 2019 at 08:11:56AM -0400, Theodore Y. Ts'o wrote:
>> On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
>>> Willy Tarreau - 17.09.19, 07:24:38 CEST:
>>>> On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
>>>>>> Well, the patch actually made getrandom() return en error too, but
>>>>>> you seem more interested in the hypotheticals than in arguing
>>>>>> actualities.>
>>>>> If you want to be safe, terminate the process.
>>>>
>>>> This is an interesting approach. At least it will cause bug reports in
>>>> application using getrandom() in an unreliable way and they will
>>>> check for other options. Because one of the issues with systems that
>>>> do not finish to boot is that usually the user doesn't know what
>>>> process is hanging.
>>>
>>
>> I would be happy with a change which changes getrandom(0) to send a
>> kill -9 to the process if it is called too early, with a new flag,
>> getrandom(GRND_BLOCK) which blocks until entropy is available.  That
>> leaves it up to the application developer to decide what behavior they
>> want.
>>
> 
> Yup, I'm convinced that's the sanest option too. I'll send a final RFC
> patch tonight implementing the following:
> 
> config GETRANDOM_CRNG_ENTROPY_MAX_WAIT_MS
> 	int
> 	default 3000
> 	help
> 	  Default max wait in milliseconds, for the getrandom(2) system
> 	  call when asking for entropy from the urandom source, until
> 	  the Cryptographic Random Number Generator (CRNG) gets
> 	  initialized.  Any process exceeding this duration for entropy
> 	  wait will get killed by kernel. The maximum wait can be
> 	  overriden through the "random.getrandom_max_wait_ms" kernel
> 	  boot parameter. Rationale follows.
> 
> 	  When the getrandom(2) system call was created, it came with
> 	  the clear warning: "Any userspace program which uses this new
> 	  functionality must take care to assure that if it is used
> 	  during the boot process, that it will not cause the init
> 	  scripts or other portions of the system startup to hang
> 	  indefinitely.
> 
> 	  Unfortunately, due to multiple factors, including not having
> 	  this warning written in a scary enough language in the
> 	  manpages, and due to glibc since v2.25 implementing a BSD-like
> 	  getentropy(3) in terms of getrandom(2), modern user-space is
> 	  calling getrandom(2) in the boot path everywhere.
> 
> 	  Embedded Linux systems were first hit by this, and reports of
> 	  embedded system "getting stuck at boot" began to be
> 	  common. Over time, the issue began to even creep into consumer
> 	  level x86 laptops: mainstream distributions, like Debian
> 	  Buster, began to recommend installing haveged as a workaround,
> 	  just to let the system boot.
> 
> 	  Filesystem optimizations in EXT4 and XFS exagerated the
> 	  problem, due to aggressive batching of IO requests, and thus
> 	  minimizing sources of entropy at boot. This led to large
> 	  delays until the kernel's Cryptographic Random Number
> 	  Generator (CRNG) got initialized, and thus having reports of
> 	  getrandom(2) inidifinitely stuck at boot.
> 
> 	  Solve this problem by setting a conservative upper bound for
> 	  getrandom(2) wait. Kill the process, instead of returning an
> 	  error code, because otherwise crypto-sensitive applications
> 	  may revert to less secure mechanisms (e.g. /dev/urandom). We
> 	  __deeply encourage__ system integrators and distribution
> 	  builders not to considerably increase this value: during
> 	  system boot, you either have entropy, or you don't. And if you
> 	  didn't have entropy, it will stay like this forever, because
> 	  if you had, you wouldn't have blocked in the first place. It's
> 	  an atomic "either/or" situation, with no middle ground. Please
> 	  think twice.
> 
> 	  Ideally, systems would be configured with hardware random
> 	  number generators, and/or configured to trust the CPU-provided
> 	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
> 	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
> 	  should generate cryptographic keys only as late as possible,
> 	  when they are needed, instead of during early boot.  (For
> 	  non-cryptographic use cases, such as dictionary seeds or MIT
> 	  Magic Cookies, other mechanisms such as /dev/urandom or
> 	  random(3) may be more appropropriate.)
> 
> Sounds good?
> 
> thanks,
> 
> --
> Ahmed Darwish
> http://darwish.chasingpointers.com
> 

This would fail the litmus test that started this thread, re-explained 
below.

0. Linus applies your patch.
1. A kernel release happens, and it boots fine.
2. Ted Ts'o invents yet another brilliant ext4 optimization, and it gets 
merged.
3. Somebody discovers that the new kernel kills all his processes, up to 
and including gnome-session, and that's obviously a regression.
4. Linus is forced to revert (2), nobody wins.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
  2019-09-17 12:46                                                                           ` Alexander E. Patrakov
@ 2019-09-17 12:47                                                                           ` Willy Tarreau
  2019-09-17 16:08                                                                           ` Lennart Poettering
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 12:47 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Martin Steigerwald, Matthew Garrett,
	Linus Torvalds, Vito Caputo, Lennart Poettering, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 12:30:15PM +0000, Ahmed S. Darwish wrote:
> Sounds good?

Sounds good to me except that I'd like to have the option to get
poor randoms. getrandom() is used when /dev/urandom is not accessible
or painful to use. Until we provide applications with a solution to
this fairly common need, the problem will continue to regularly pop
up, in a different way ("my application randomly crashes at boot").
Let's get GRND_INSECURE in addition to your change and I think all
needs will be properly covered.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
@ 2019-09-17 13:11                                                                         ` Alexander E. Patrakov
  2019-09-17 13:37                                                                           ` Alexander E. Patrakov
  2019-09-17 15:57                                                                         ` Lennart Poettering
  2 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-17 13:11 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Martin Steigerwald
  Cc: Willy Tarreau, Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

17.09.2019 17:11, Theodore Y. Ts'o пишет:
> There are only two ways out of this mess.  The first option is we take
> functionality away from a userspace author who Really Wants A Secure
> Random Number Generator.  And there are an awful lot of programs who
> really want secure crypto, becuase this is not a hypothetical.  The
> result in "Mining your P's and Q's" did happen before.  If we forget
> the history, we are doomed to repeat it.

You cannot take away functionality that does not really exist. Every 
time somebody tries to use it, there is a huge news, "the boot process 
is blocked on application FOO", followed by an insecure fallback to 
/dev/urandom in the said application or library.

Regarding the "Mining your P's and Q's" paper: I would say it is a 
combination of TWO faults, only one of which (poor, or, as explained 
below, "marginally poor" entropy) is discussed and the other one (not 
really sound crypto when deriving the RSA key from the 
presumedly-available entropy) is ignored.

The authors of the paper factored the weak keys by applying the 
generalized GCD algorithm, thus looking for common factors in the RSA 
public keys. For two RSA public keys to be detected as faulty, they must 
share exactly one of their prime factors. In other words: repeated keys 
were specifically excluded from the study by the paper authors.

Sharing only one of the two primes means that that the systems in 
question behaved identically when they generated the first prime, but 
diverged (possibly due to the extra entropy becoming available) when 
they generated the second one. And asking the randomness for p and for q 
separately is what I would call the application bug here that nobody 
wants to talk about: both p and q should have been derived from a CSPRNG 
seeded by a single read from a random source. If that practice were 
followed, then it would either result in a duplicate key (which is not 
as bad as a factorable one), or in completely unrelated keys.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 13:11                                                                         ` Alexander E. Patrakov
@ 2019-09-17 13:37                                                                           ` Alexander E. Patrakov
  0 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-17 13:37 UTC (permalink / raw)
  To: Theodore Y. Ts'o, Martin Steigerwald
  Cc: Willy Tarreau, Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Vito Caputo, Lennart Poettering, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

17.09.2019 18:11, Alexander E. Patrakov пишет:
> 17.09.2019 17:11, Theodore Y. Ts'o пишет:
>> There are only two ways out of this mess.  The first option is we take
>> functionality away from a userspace author who Really Wants A Secure
>> Random Number Generator.  And there are an awful lot of programs who
>> really want secure crypto, becuase this is not a hypothetical.  The
>> result in "Mining your P's and Q's" did happen before.  If we forget
>> the history, we are doomed to repeat it.
> 
> You cannot take away functionality that does not really exist. Every 
> time somebody tries to use it, there is a huge news, "the boot process 
> is blocked on application FOO", followed by an insecure fallback to 
> /dev/urandom in the said application or library.
> 
> Regarding the "Mining your P's and Q's" paper: I would say it is a 
> combination of TWO faults, only one of which (poor, or, as explained 
> below, "marginally poor" entropy) is discussed and the other one (not 
> really sound crypto when deriving the RSA key from the 
> presumedly-available entropy) is ignored.
> 
> The authors of the paper factored the weak keys by applying the 
> generalized GCD algorithm, thus looking for common factors in the RSA 
> public keys. For two RSA public keys to be detected as faulty, they must 
> share exactly one of their prime factors. In other words: repeated keys 
> were specifically excluded from the study by the paper authors.
> 
> Sharing only one of the two primes means that that the systems in 
> question behaved identically when they generated the first prime, but 
> diverged (possibly due to the extra entropy becoming available) when 
> they generated the second one. And asking the randomness for p and for q 
> separately is what I would call the application bug here that nobody 
> wants to talk about: both p and q should have been derived from a CSPRNG 
> seeded by a single read from a random source. If that practice were 
> followed, then it would either result in a duplicate key (which is not 
> as bad as a factorable one), or in completely unrelated keys.

I take this back. Of course, completely duplicate keys are weak keys, 
and they are even more dangerous because they are not distinguishable 
from intentionally copied good keys by the method in the paper.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-16 17:21                                             ` Theodore Y. Ts'o
                                                                 ` (2 preceding siblings ...)
  2019-09-16 19:53                                               ` Ahmed S. Darwish
@ 2019-09-17 15:32                                               ` Lennart Poettering
  3 siblings, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 15:32 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Linus Torvalds, Willy Tarreau, Vito Caputo, Ahmed S. Darwish,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Mo, 16.09.19 13:21, Theodore Y. Ts'o (tytso@mit.edu) wrote:

> We could create a new flag, GRND_INSECURE, which never blocks.  And
> that that allows us to solve the problem for silly applications that
> are using getrandom(2) for non-cryptographic use cases.  Use cases
> might include Python dictionary seeds, gdm for MIT Magic Cookie, UUID
> generation where best efforts probably is good enough, etc.  The
> answer today is they should just use /dev/urandom, since that exists
> today, and we have to support it for backwards compatibility anyway.
> It sounds like gdm recently switched to getrandom(2), and I suspect
> that it's going to get caught on some hardware configs anyway, even
> without the ext4 optimization patch.  So I suspect gdm will switch
> back to /dev/urandom, and this particular pain point will probably go
> away.

The problem is that reading from /dev/urandom at a point where it's
not initialized yet results in noisy kernel logging on current
kernels. If you want people to use /dev/urandom then the logging needs
to go away, because it scares people, makes them file bug reports and
so on, even though there isn't actually any problem for these specific
purposes.

For that reason I'd prefer GRND_INSECURE I must say, because it
indicates people grokked "I know I might get questionnable entropy".

Lennart

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
  2019-09-17 13:11                                                                         ` Alexander E. Patrakov
@ 2019-09-17 15:57                                                                         ` Lennart Poettering
  2019-09-17 16:21                                                                           ` Willy Tarreau
  2 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 15:57 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Willy Tarreau, Matthew Garrett, Linus Torvalds, Ahmed S. Darwish,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, Alexander E. Patrakov, zhangjs, linux-ext4,
	lkml

On Di, 17.09.19 08:11, Theodore Y. Ts'o (tytso@mit.edu) wrote:

> On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
> > Willy Tarreau - 17.09.19, 07:24:38 CEST:
> > > On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
> > > > >Well, the patch actually made getrandom() return en error too, but
> > > > >you seem more interested in the hypotheticals than in arguing
> > > > >actualities.>
> > > > If you want to be safe, terminate the process.
> > >
> > > This is an interesting approach. At least it will cause bug reports in
> > > application using getrandom() in an unreliable way and they will
> > > check for other options. Because one of the issues with systems that
> > > do not finish to boot is that usually the user doesn't know what
> > > process is hanging.
> >
>
> I would be happy with a change which changes getrandom(0) to send a
> kill -9 to the process if it is called too early, with a new flag,
> getrandom(GRND_BLOCK) which blocks until entropy is available.  That
> leaves it up to the application developer to decide what behavior they
> want.

Note that calling getrandom(0) "too early" is not something people do
on purpose. It happens by accident, i.e. because we live in a world
where SSH or HTTPS or so is run in the initrd already, and in a world
where booting sometimes can be very very fast. So even if you write a
program and you think "this stuff should run late I'll just
getrandom(0)" it might not actually be that case IRL because people
deploy it a slightly bit differently than you initially thought in a
slightly differently equipped system with other runtime behaviour...

Lennart

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 12:30                                                                         ` Ahmed S. Darwish
  2019-09-17 12:46                                                                           ` Alexander E. Patrakov
  2019-09-17 12:47                                                                           ` Willy Tarreau
@ 2019-09-17 16:08                                                                           ` Lennart Poettering
  2019-09-17 16:23                                                                             ` Linus Torvalds
  2 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 16:08 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Theodore Y. Ts'o, Willy Tarreau, Matthew Garrett,
	Linus Torvalds, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Di, 17.09.19 12:30, Ahmed S. Darwish (darwish.07@gmail.com) wrote:

> 	  Ideally, systems would be configured with hardware random
> 	  number generators, and/or configured to trust the CPU-provided
> 	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
> 	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
> 	  should generate cryptographic keys only as late as possible,
> 	  when they are needed, instead of during early boot.  (For
> 	  non-cryptographic use cases, such as dictionary seeds or MIT
> 	  Magic Cookies, other mechanisms such as /dev/urandom or
> 	  random(3) may be more appropropriate.)
>
> Sounds good?

This sounds mean. You make apps pay for something they aren't really
at fault for.

I mean, in the cloud people typically put together images that are
replicated to many systems, and as first thing generate an SSH key, on
the individual system. In fact, most big distros tend to ship SSH that
is precisely set up this way: on first boot the SSH key is
generated. They tend to call getrandom(0) for this right now, and
rightfully so. Now suddenly you kill them because they are doing
everything correctly? Those systems aren't going to be more useful if
they have no SSH key at all than they would be if they would hang at
boot: either way you can't log in.

Here's what I'd propose:

1) Add GRND_INSECURE to get those users of getrandom() who do not need
   high quality entropy off its use (systemd has uses for this, for
   seeding hash tables for example), thus reducing the places where
   things might block.

2) Add a kernel log message if a getrandom(0) client hung for 15s or
   more, explaining the situation briefly, but not otherwise changing
   behaviour.

3) Change systemd-random-seed.service to log to console in the same
   case, blocking boot cleanly and discoverably.

I am not a fan of randomly killing userspace processes that just
happened to be the unlucky ones, to call this first... I see no
benefit in killing stuff over letting boot hang in a discoverable way.

Lennart

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 15:57                                                                         ` Lennart Poettering
@ 2019-09-17 16:21                                                                           ` Willy Tarreau
  2019-09-17 17:13                                                                             ` Lennart Poettering
  2019-09-17 20:36                                                                             ` Martin Steigerwald
  0 siblings, 2 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 16:21 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Tue, Sep 17, 2019 at 05:57:43PM +0200, Lennart Poettering wrote:
> Note that calling getrandom(0) "too early" is not something people do
> on purpose. It happens by accident, i.e. because we live in a world
> where SSH or HTTPS or so is run in the initrd already, and in a world
> where booting sometimes can be very very fast.

It's not an accident, it's a lack of understanding of the impacts
from the people who package the systems. Generating an SSH key from
an initramfs without thinking where the randomness used for this could
come from is not accidental, it's a lack of experience that will be
fixed once they start to collect such reports. And those who absolutely
need their SSH daemon or HTTPS server for a recovery image in initramfs
can very well feed fake entropy by dumping whatever they want into
/dev/random to make it possible to build temporary keys for use within
this single session. At least all supposedly incorrect use will be made
*on purpose* and will still be possible to match what users need.

> So even if you write a
> program and you think "this stuff should run late I'll just
> getrandom(0)" it might not actually be that case IRL because people
> deploy it a slightly bit differently than you initially thought in a
> slightly differently equipped system with other runtime behaviour...

I agree with this, it's precisely because I think we should not restrict
userspace capabilities that I want the issue addressed in a way that lets
users do what they need instead of relying on dangerous workarounds. Just
googling for "mknod /dev/random c 1 9" returns tens, maybe hundreds of
pages all explaining how to fix the problem of non-booting systems. It
simply proves that the kernel is not the place to decide what users are
allowed to do. Let's give them the tools to work correctly and be
responsible for their choices. They just need to be hit by bad choices
to get some feedback from the field other than a new list of well-known
SSH keys.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:08                                                                           ` Lennart Poettering
@ 2019-09-17 16:23                                                                             ` Linus Torvalds
  2019-09-17 16:34                                                                               ` Reindl Harald
  2019-09-17 17:42                                                                               ` Lennart Poettering
  0 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17 16:23 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Tue, Sep 17, 2019 at 9:08 AM Lennart Poettering <mzxreary@0pointer.de> wrote:
>
> Here's what I'd propose:

So I think this is ok, but I have another proposal. Before I post that
one, though, I just wanted to point out:

> 1) Add GRND_INSECURE to get those users of getrandom() who do not need
>    high quality entropy off its use (systemd has uses for this, for
>    seeding hash tables for example), thus reducing the places where
>    things might block.

I really think that trhe logic should be the other way around.

The getrandom() users that don't need high quality entropy are the
ones that don't really think about this, and so _they_ shouldn't be
the ones that have to explicitly state anything. To those users,
"random is random". By definition they don't much care, and quite
possibly they don't even know what "entropy" really means in that
context.

The ones that *do* want high security randomness should be the ones
that know that "random" means different things to different people,
and that randomness is hard.

So the onus should be on them to say that "yes, I'm one of those
people willing to wait".

That's why I'd like to see GRND_SECURE instead. That's kind of what
GRND_RANDOM is right now, but it went overboard and it's not useful
even to the people who do want secure random numners.

Besides, the GRND_RANDOM naming doesn't really help the people who
don't know anyway, so it's just bad in so many ways. We should
probably just get rid of that flag entirely and make it imply
GRND_SECURE without the overdone entropy accounting, but that's a
separate issue.

When we do add GRND_SECURE, we should also add the GRND_INSECURE just
to allow people to mark their use, and to avoid the whole existing
confusion about "0".

> 2) Add a kernel log message if a getrandom(0) client hung for 15s or
>    more, explaining the situation briefly, but not otherwise changing
>    behaviour.

The problem is that when you have some graphical boot, you'll not even
see the kernel messages ;(

I do agree that a message is a good idea regardless, but I don't think
it necessarily solves the problems except for developers.

> 3) Change systemd-random-seed.service to log to console in the same
>    case, blocking boot cleanly and discoverably.

So I think systemd-random-seed might as well just use a new
GRND_SECURE, and then not even have to worry about it.

That said, I think I have a suggestion that everybody can live with -
even if they might not be _happy_ about it. See next email.

> I am not a fan of randomly killing userspace processes that just
> happened to be the unlucky ones, to call this first... I see no
> benefit in killing stuff over letting boot hang in a discoverable way.

Absolutely agreed. The point was to not break things.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17  7:33                                                                     ` Martin Steigerwald
  2019-09-17  8:35                                                                       ` Willy Tarreau
  2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
@ 2019-09-17 16:27                                                                       ` Linus Torvalds
  2019-09-17 16:34                                                                         ` Matthew Garrett
                                                                                           ` (2 more replies)
  2 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17 16:27 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Willy Tarreau, Matthew Garrett, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 4000 bytes --]

On Tue, Sep 17, 2019 at 12:33 AM Martin Steigerwald <martin@lichtvoll.de> wrote:
>
> So yes, that would it make it harder to abuse the API, but not
> impossible. Which may still be good, I don't know.

So the real problem is not people abusing the ABI per se. Yes, I was a
bit worried about that too, but it's not the cause of the immediate
issue.

The real problem is that "getrandom(0)" is really _convenient_ for
people who just want random numbers - and not at all the "secure"
kind.

And it's convenient, and during development and testing, it always
"just works", because it doesn't ever block in any normal situation.

And then you deploy it, and on some poor users machine it *does*
block, because the program now encounters the "oops, no entropy"
situation that it never ever encountered on the development machine,
because the testing there was mainly done not during booting, but the
developer also probably had a much more modern machine that had
rdrand, and that quite possibly also had more services enabled at
bootup etc so even without rdrand it got tons of entropy.

That's why

 (a) killing the process is _completely_ silly.  It misses the whole
point of the problem in the first place and only makes things much
worse.

 (b) we should just change getrandom() and add that GRND_SECURE flag
instead. Because the current API is fundamentally confusing. If you
want secure random numbers, you should really deeply _know_ about it,
and think about it, rather than have it be the "oh, don't even bother
passing any flags, it's secure by default".

 (c) the timeout approach isn't wonderful, but it at least helps with
the "this was never tested under those circumstances" kind of problem.

Note that the people who actually *thought* about getrandom() and use
it correctly should already handle error returns (even for the
blocking version), because getrandom() can already return EINTR. So
the argument that we should cater primarily to the secure key people
is not all that strong. We should be able to return EINTR, and the
people who *thought* about blocking and about entropy should be fine.

And gdm and other silly random users that never wanted entropy in the
first place, just "random" random numbers, wouldn't be in the
situation they are now.

That said - looking at some of the problematic traces that Ahmed
posted for his bootup problem, I actually think we can use *another*
heuristic to solve the problem. Namely just looking at how much
randomness the caller wants.

The processes that ask for randomness for an actual secure key have a
very fundamental constraint: they need enough randomness for the key
to be secure in the first place.

But look at what gnome-shell and gnome-session-b does:

    https://lore.kernel.org/linux-ext4/20190912034421.GA2085@darwi-home-pc/

and most of them already set GRND_NONBLOCK, but look at the
problematic one that actually causes the boot problem:

    gnome-session-b-327   4.400620: getrandom(16 bytes, flags = 0)

and here the big clue is: "Hey, it only asks for 128 bits of randomness".

Does anybody believe that 128 bits of randomness is a good basis for a
long-term secure key? Even if the key itself contains than that, if
you are generating a long-term secure key in this day and age, you had
better be asking for more than 128 bits of actual unpredictable base
data. So just based on the size of the request we can determine that
this is not hugely important.

Compare that to the case later on for something that seems to ask for
actual interesting randomness. and - just judging by the name -
probably even has a reason for it:

      gsd-smartcard-388   51.433924: getrandom(110 bytes, flags = 0)
      gsd-smartcard-388   51.433936: getrandom(256 bytes, flags = 0)

big difference.

End result: I would propose the attached patch.

Ahmed, can you just verify that it works for you (obviously with the
ext4 plugging reinstated)? It looks like it should "obviously" fix
things, but still...

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1740 bytes --]

 drivers/char/random.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 566922df4b7b..7be771eac969 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2118,6 +2118,37 @@ const struct file_operations urandom_fops = {
 	.llseek = noop_llseek,
 };
 
+/*
+ * Hacky workaround for the fact that some processes
+ * ask for truly secure random numbers and absolutely want
+ * to wait for the entropy pool to fill, and others just
+ * do "getrandom(0)" to get some ad-hoc random numbers.
+ *
+ * If you're generating a secure key, you'd better ask for
+ * more than 128 bits of randomness. Otherwise it's not
+ * really all that secure by definition.
+ *
+ * We should add a GRND_SECURE flag so that people can state
+ * this "I want secure random numbers" explicitly.
+ */
+static int wait_for_getrandom(size_t count)
+{
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
+	int ret;
+
+	/* We'll give even small requests _some_ time to get more entropy */
+	if (count <= 16)
+		timeout = 5*HZ;
+
+	ret = wait_event_interruptible_timeout(crng_init_wait, crng_ready(), timeout);
+	if (likely(ret))
+		return ret > 0 ? 0 : ret;
+
+	/* Timed out - we'll return urandom */
+	pr_notice("random: falling back to urandom for small request of %zu bytes", count);
+	return 0;
+}
+
 SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 		unsigned int, flags)
 {
@@ -2135,7 +2166,7 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (!crng_ready()) {
 		if (flags & GRND_NONBLOCK)
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
+		ret = wait_for_getrandom(count);
 		if (unlikely(ret))
 			return ret;
 	}

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:23                                                                             ` Linus Torvalds
@ 2019-09-17 16:34                                                                               ` Reindl Harald
  2019-09-17 17:42                                                                               ` Lennart Poettering
  1 sibling, 0 replies; 211+ messages in thread
From: Reindl Harald @ 2019-09-17 16:34 UTC (permalink / raw)
  To: Linus Torvalds, Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml



Am 17.09.19 um 18:23 schrieb Linus Torvalds:
> I do agree that a message is a good idea regardless, but I don't think
> it necessarily solves the problems except for developers

sadly in our current world dvelopers and maintainers don't read any logs
and as long it compiles and boots it works and can be pushed :-(

they even argue instead fix a dmaned line in a textfile which could have
been fixed 8 years in advance and i have written a ton of such reports
for F30 not talking about 15 others where software spits warnings with
the source file and line into the syslog and nobody out there gives a
damn about it

one example of many
https://bugzilla.redhat.com/show_bug.cgi?id=1748322

the only way you can get developers to clean up their mess these days is
to spit it straight into their face in modal window everytime they login
but how to exclude innocent endusers.....

half of my "rsyslog.conf" is to filter out stuff i can't fix anyways to
have my peace when call the script below every time i reboot whatever
linux machine

the 'usb_serial_init - returning with error' is BTW Linux when you boot
with 'nousb usbcore.nousb'

------------------

[root@srv-rhsoft:~]$ cat /scripts/system-errors.sh
#!/usr/bin/dash
dmesg -T | grep --color -i warn | grep -v 'Perf event create on CPU' |
grep -v 'Hardware RNG Device' | grep -v 'TPM RNG Device' | grep -v
'Correctable Errors collector initialized' | grep -v
'error=format-security' | grep -v 'MHD_USE_THREAD_PER_CONNECTION' | grep
-v 'usb_serial_init - returning with error' | grep -v
'systemd-journald.service' | grep -v 'usb_serial_init - registering
generic driver failed'
grep --color -i warn /var/log/messages | grep -v 'Perf event create on
CPU' | grep -v 'Hardware RNG Device' | grep -v 'TPM RNG Device' | grep
-v 'Correctable Errors collector initialized' | grep -v
'error=format-security' | grep -v 'MHD_USE_THREAD_PER_CONNECTION' | grep
-v 'usb_serial_init - returning with error' | grep -v
'systemd-journald.service' | grep -v 'usb_serial_init - registering
generic driver failed'
dmesg -T | grep --color -i fail | grep -v 'BAR 13' | grep -v 'Perf event
create on CPU' | grep -v 'Hardware RNG Device' | grep -v 'TPM RNG
Device' | grep -v 'Correctable Errors collector initialized' | grep -v
'error=format-security' | grep -v 'MHD_USE_THREAD_PER_CONNECTION' | grep
-v 'usb_serial_init - returning with error' | grep -v
'systemd-journald.service' | grep -v 'usb_serial_init - registering
generic driver failed'
grep --color -i fail /var/log/messages | grep -v 'BAR 13' | grep -v
'Perf event create on CPU' | grep -v 'Hardware RNG Device' | grep -v
'TPM RNG Device' | grep -v 'Correctable Errors collector initialized' |
grep -v 'error=format-security' | grep -v
'MHD_USE_THREAD_PER_CONNECTION' | grep -v 'usb_serial_init - returning
with error' | grep -v 'systemd-journald.service' | grep -v
'usb_serial_init - registering generic driver failed'
dmesg -T | grep --color -i error | grep -v 'Perf event create on CPU' |
grep -v 'Hardware RNG Device' | grep -v 'TPM RNG Device' | grep -v
'Correctable Errors collector initialized' | grep -v
'error=format-security' | grep -v 'MHD_USE_THREAD_PER_CONNECTION' | grep
-v 'usb_serial_init - returning with error' | grep -v
'systemd-journald.service' | grep -v 'usb_serial_init - registering
generic driver failed'
grep --color -i error /var/log/messages | grep -v 'Perf event create on
CPU' | grep -v 'Hardware RNG Device' | grep -v 'TPM RNG Device' | grep
-v 'Correctable Errors collector initialized' | grep -v
'error=format-security' | grep -v 'MHD_USE_THREAD_PER_CONNECTION' | grep
-v 'usb_serial_init - returning with error' | grep -v
'systemd-journald.service' | grep -v 'usb_serial_init - registering
generic driver failed'
grep --color -i "scheduling restart" /var/log/messages | grep -v
'systemd-journald.service'
[root@srv-rhsoft:~]$

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:27                                                                       ` Linus Torvalds
@ 2019-09-17 16:34                                                                         ` Matthew Garrett
  2019-09-17 17:16                                                                           ` Willy Tarreau
  2019-09-17 16:58                                                                         ` Alexander E. Patrakov
  2019-09-17 17:28                                                                         ` Lennart Poettering
  2 siblings, 1 reply; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17 16:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Steigerwald, Willy Tarreau, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 09:27:44AM -0700, Linus Torvalds wrote:

> Does anybody believe that 128 bits of randomness is a good basis for a
> long-term secure key?

Yes, it's exactly what you'd expect for an AES 128 key, which is still 
considered to be secure.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:27                                                                       ` Linus Torvalds
  2019-09-17 16:34                                                                         ` Matthew Garrett
@ 2019-09-17 16:58                                                                         ` Alexander E. Patrakov
  2019-09-17 17:30                                                                           ` Lennart Poettering
  2019-09-17 17:28                                                                         ` Lennart Poettering
  2 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-17 16:58 UTC (permalink / raw)
  To: Linus Torvalds, Martin Steigerwald
  Cc: Willy Tarreau, Matthew Garrett, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

17.09.2019 21:27, Linus Torvalds пишет:
> On Tue, Sep 17, 2019 at 12:33 AM Martin Steigerwald <martin@lichtvoll.de> wrote:
>>
>> So yes, that would it make it harder to abuse the API, but not
>> impossible. Which may still be good, I don't know.
> 
> So the real problem is not people abusing the ABI per se. Yes, I was a
> bit worried about that too, but it's not the cause of the immediate
> issue.
> 
> The real problem is that "getrandom(0)" is really _convenient_ for
> people who just want random numbers - and not at all the "secure"
> kind.
> 
> And it's convenient, and during development and testing, it always
> "just works", because it doesn't ever block in any normal situation.
> 
> And then you deploy it, and on some poor users machine it *does*
> block, because the program now encounters the "oops, no entropy"
> situation that it never ever encountered on the development machine,
> because the testing there was mainly done not during booting, but the
> developer also probably had a much more modern machine that had
> rdrand, and that quite possibly also had more services enabled at
> bootup etc so even without rdrand it got tons of entropy.
> 
> That's why
> 
>   (a) killing the process is _completely_ silly.  It misses the whole
> point of the problem in the first place and only makes things much
> worse.
> 
>   (b) we should just change getrandom() and add that GRND_SECURE flag
> instead. Because the current API is fundamentally confusing. If you
> want secure random numbers, you should really deeply _know_ about it,
> and think about it, rather than have it be the "oh, don't even bother
> passing any flags, it's secure by default".
> 
>   (c) the timeout approach isn't wonderful, but it at least helps with
> the "this was never tested under those circumstances" kind of problem.
> 
> Note that the people who actually *thought* about getrandom() and use
> it correctly should already handle error returns (even for the
> blocking version), because getrandom() can already return EINTR. So
> the argument that we should cater primarily to the secure key people
> is not all that strong. We should be able to return EINTR, and the
> people who *thought* about blocking and about entropy should be fine.
> 
> And gdm and other silly random users that never wanted entropy in the
> first place, just "random" random numbers, wouldn't be in the
> situation they are now.
> 
> That said - looking at some of the problematic traces that Ahmed
> posted for his bootup problem, I actually think we can use *another*
> heuristic to solve the problem. Namely just looking at how much
> randomness the caller wants.
> 
> The processes that ask for randomness for an actual secure key have a
> very fundamental constraint: they need enough randomness for the key
> to be secure in the first place.
> 
> But look at what gnome-shell and gnome-session-b does:
> 
>      https://lore.kernel.org/linux-ext4/20190912034421.GA2085@darwi-home-pc/
> 
> and most of them already set GRND_NONBLOCK, but look at the
> problematic one that actually causes the boot problem:
> 
>      gnome-session-b-327   4.400620: getrandom(16 bytes, flags = 0)
> 
> and here the big clue is: "Hey, it only asks for 128 bits of randomness".
> 
> Does anybody believe that 128 bits of randomness is a good basis for a
> long-term secure key? Even if the key itself contains than that, if
> you are generating a long-term secure key in this day and age, you had
> better be asking for more than 128 bits of actual unpredictable base
> data. So just based on the size of the request we can determine that
> this is not hugely important.
> 
> Compare that to the case later on for something that seems to ask for
> actual interesting randomness. and - just judging by the name -
> probably even has a reason for it:
> 
>        gsd-smartcard-388   51.433924: getrandom(110 bytes, flags = 0)
>        gsd-smartcard-388   51.433936: getrandom(256 bytes, flags = 0)
> 
> big difference.
> 
> End result: I would propose the attached patch.
> 
> Ahmed, can you just verify that it works for you (obviously with the
> ext4 plugging reinstated)? It looks like it should "obviously" fix
> things, but still...

I have looked at the patch, but have not tested it.

I am worried that the getrandom delays will be serialized, because 
processes sometimes run one after another. If there are enough 
chained/dependent processes that ask for randomness before it is ready, 
the end result is still a too-big delay, essentially a failed boot.

In other words: your approach of adding delays only makes sense for 
heavily parallelized boot, which may not be the case, especially for 
embedded systems that don't like systemd.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:21                                                                           ` Willy Tarreau
@ 2019-09-17 17:13                                                                             ` Lennart Poettering
  2019-09-17 17:29                                                                               ` Willy Tarreau
  2019-09-17 20:36                                                                             ` Martin Steigerwald
  1 sibling, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 17:13 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Di, 17.09.19 18:21, Willy Tarreau (w@1wt.eu) wrote:

> On Tue, Sep 17, 2019 at 05:57:43PM +0200, Lennart Poettering wrote:
> > Note that calling getrandom(0) "too early" is not something people do
> > on purpose. It happens by accident, i.e. because we live in a world
> > where SSH or HTTPS or so is run in the initrd already, and in a world
> > where booting sometimes can be very very fast.
>
> It's not an accident, it's a lack of understanding of the impacts
> from the people who package the systems. Generating an SSH key from
> an initramfs without thinking where the randomness used for this could
> come from is not accidental, it's a lack of experience that will be
> fixed once they start to collect such reports. And those who absolutely
> need their SSH daemon or HTTPS server for a recovery image in initramfs
> can very well feed fake entropy by dumping whatever they want into
> /dev/random to make it possible to build temporary keys for use within
> this single session. At least all supposedly incorrect use will be made
> *on purpose* and will still be possible to match what users need.

What do you expect these systems to do though?

I mean, think about general purpose distros: they put together live
images that are supposed to work on a myriad of similar (as in: same
arch) but otherwise very different systems (i.e. VMs that might lack
any form of RNG source the same as beefy servers with muliple sources
the same as older netbooks with few and crappy sources, …). They can't
know what the specific hw will provide or won't. It's not their
incompetence that they build the image like that. It's a common, very
common usecase to install a system via SSH, and it's also very common
to have very generic images for a large number varied systems to run
on.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:34                                                                         ` Matthew Garrett
@ 2019-09-17 17:16                                                                           ` Willy Tarreau
  2019-09-17 17:20                                                                             ` Matthew Garrett
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 17:16 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Linus Torvalds, Martin Steigerwald, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 05:34:56PM +0100, Matthew Garrett wrote:
> On Tue, Sep 17, 2019 at 09:27:44AM -0700, Linus Torvalds wrote:
> 
> > Does anybody believe that 128 bits of randomness is a good basis for a
> > long-term secure key?
> 
> Yes, it's exactly what you'd expect for an AES 128 key, which is still 
> considered to be secure.

AES keys are for symmetrical encryption and thus as such are short-lived.
We're back to what Linus was saying about the fact that our urandom is
already very good for such use cases, it should just not be used to
produce long-lived keys (i.e. asymmetrical).

However I'm worried regarding this precise patch about the fact that
delays will add up. I think that once we've failed to wait for a first
process, we've broken any hypothetical trust in terms of random quality
so there's no point continuing to wait for future requests.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:16                                                                           ` Willy Tarreau
@ 2019-09-17 17:20                                                                             ` Matthew Garrett
  2019-09-17 17:23                                                                               ` Matthew Garrett
  2019-09-17 17:57                                                                               ` Willy Tarreau
  0 siblings, 2 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17 17:20 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Martin Steigerwald, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 07:16:41PM +0200, Willy Tarreau wrote:
> On Tue, Sep 17, 2019 at 05:34:56PM +0100, Matthew Garrett wrote:
> > On Tue, Sep 17, 2019 at 09:27:44AM -0700, Linus Torvalds wrote:
> > 
> > > Does anybody believe that 128 bits of randomness is a good basis for a
> > > long-term secure key?
> > 
> > Yes, it's exactly what you'd expect for an AES 128 key, which is still 
> > considered to be secure.
> 
> AES keys are for symmetrical encryption and thus as such are short-lived.
> We're back to what Linus was saying about the fact that our urandom is
> already very good for such use cases, it should just not be used to
> produce long-lived keys (i.e. asymmetrical).

AES keys are used for a variety of long-lived purposes (eg, disk 
encryption).
 
-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:20                                                                             ` Matthew Garrett
@ 2019-09-17 17:23                                                                               ` Matthew Garrett
  2019-09-17 17:57                                                                               ` Willy Tarreau
  1 sibling, 0 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17 17:23 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Martin Steigerwald, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 06:20:02PM +0100, Matthew Garrett wrote:

> AES keys are used for a variety of long-lived purposes (eg, disk 
> encryption).

And as an example of when we'd want to do that during early boot - swap 
is frequently encrypted with a random key generated on each boot, but 
it's still important for that key to be strong in order to avoid someone 
being able to recover the contents of swap.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:27                                                                       ` Linus Torvalds
  2019-09-17 16:34                                                                         ` Matthew Garrett
  2019-09-17 16:58                                                                         ` Alexander E. Patrakov
@ 2019-09-17 17:28                                                                         ` Lennart Poettering
  2 siblings, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 17:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Matthew Garrett, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Di, 17.09.19 09:27, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> But look at what gnome-shell and gnome-session-b does:
>
>     https://lore.kernel.org/linux-ext4/20190912034421.GA2085@darwi-home-pc/
>
> and most of them already set GRND_NONBLOCK, but look at the
> problematic one that actually causes the boot problem:
>
>     gnome-session-b-327   4.400620: getrandom(16 bytes, flags = 0)
>
> and here the big clue is: "Hey, it only asks for 128 bits of
> randomness".

I don't think this is a good check to make.

In fact most cryptography folks say taking out more than 256bit is
never going to make sense, that's why BSD getentropy() even returns an
error if you ask for more than 256bit. (and glibc's getentropy()
wrapper around getrandom() enforces the same size limit btw)

On the BSDs the kernel's getentropy() call is primarily used to seed
their libc's arc4random() every now and then, and userspace is
supposed to use only arc4random(). I am pretty sure we should do the
same on Linux in the long run. i.e. the idea that everyone uses the
kernel syscall directly sounds wrong to me, and designing the syscall
so that everyone calls it is hence wrong too.

On the BSDs getentropy() is hence unconditionally blocking, without
any flags or so, which makes sense since it's not supposed to be
user-facing really so much, but more a basic primitive for low-level
userspace infrastructure only, that is supposed to be wrapped
non-trivially to be useful. (that's at least how I understood their
APIs)

> Does anybody believe that 128 bits of randomness is a good basis for a
> long-term secure key? Even if the key itself contains than that, if
> you are generating a long-term secure key in this day and age, you had
> better be asking for more than 128 bits of actual unpredictable base
> data. So just based on the size of the request we can determine that
> this is not hugely important.

aes128 is very common today. It's what baseline security is.

I have the suspicion crypto folks would argue that 128…256 is the only
sane range for cryptographic keys...

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:13                                                                             ` Lennart Poettering
@ 2019-09-17 17:29                                                                               ` Willy Tarreau
  2019-09-17 20:42                                                                                 ` Martin Steigerwald
  2019-09-18 13:38                                                                                 ` Lennart Poettering
  0 siblings, 2 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 17:29 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Tue, Sep 17, 2019 at 07:13:28PM +0200, Lennart Poettering wrote:
> On Di, 17.09.19 18:21, Willy Tarreau (w@1wt.eu) wrote:
> 
> > On Tue, Sep 17, 2019 at 05:57:43PM +0200, Lennart Poettering wrote:
> > > Note that calling getrandom(0) "too early" is not something people do
> > > on purpose. It happens by accident, i.e. because we live in a world
> > > where SSH or HTTPS or so is run in the initrd already, and in a world
> > > where booting sometimes can be very very fast.
> >
> > It's not an accident, it's a lack of understanding of the impacts
> > from the people who package the systems. Generating an SSH key from
> > an initramfs without thinking where the randomness used for this could
> > come from is not accidental, it's a lack of experience that will be
> > fixed once they start to collect such reports. And those who absolutely
> > need their SSH daemon or HTTPS server for a recovery image in initramfs
> > can very well feed fake entropy by dumping whatever they want into
> > /dev/random to make it possible to build temporary keys for use within
> > this single session. At least all supposedly incorrect use will be made
> > *on purpose* and will still be possible to match what users need.
> 
> What do you expect these systems to do though?
> 
> I mean, think about general purpose distros: they put together live
> images that are supposed to work on a myriad of similar (as in: same
> arch) but otherwise very different systems (i.e. VMs that might lack
> any form of RNG source the same as beefy servers with muliple sources
> the same as older netbooks with few and crappy sources, ...). They can't
> know what the specific hw will provide or won't. It's not their
> incompetence that they build the image like that. It's a common, very
> common usecase to install a system via SSH, and it's also very common
> to have very generic images for a large number varied systems to run
> on.

I'm totally file with installing the system via SSH, using a temporary
SSH key. I do make a strong distinction between the installation phase
and the final deployment. The SSH key used *for installation* doesn't
need to the be same as the final one. And very often at the end of the
installation we'll have produced enough entropy to produce a correct
key.

It's not because people got used to doing things the wrong way by
ignorance of how randomness works and raised this to an industrial
level that they must not adapt a little bit. If they insist on producing
an SSH key immediately at boot, you can be sure that many of those that
never fail are probably bad because they probably used some of the
tricks mentioned in this thread (like the fairly common mknod trick
that can make sense in a temporary system installation image) :-/

I maintain that we don't need the same amount of entropy to run a
regular system and to create a new key, and that as such it is not
a reasonable thing to do to create such a key as the first action.
I'm not saying that doing things correctly is as easy, but it's not
impossible at all: many of us have already used systems which use
something like dropbear with temporary key on the install image but
run off openssh in the final system image.

And even when booting off a pre-configured final image we could
easily imagine that the ssh service detects lack of entropy and
runs with a temporary key that is not saved, and in the background
starts a process trying to produce a final key for later use.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:58                                                                         ` Alexander E. Patrakov
@ 2019-09-17 17:30                                                                           ` Lennart Poettering
  2019-09-17 17:32                                                                             ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 17:30 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Linus Torvalds, Willy Tarreau, Matthew Garrett, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

On Di, 17.09.19 21:58, Alexander E. Patrakov (patrakov@gmail.com) wrote:

> I am worried that the getrandom delays will be serialized, because processes
> sometimes run one after another. If there are enough chained/dependent
> processes that ask for randomness before it is ready, the end result is
> still a too-big delay, essentially a failed boot.
>
> In other words: your approach of adding delays only makes sense for heavily
> parallelized boot, which may not be the case, especially for embedded
> systems that don't like systemd.

As mentioned elsewhere: once the pool is initialized it's
initialized. This means any pending getrandom() on the whole system
will unblock at the same time, and from the on all getrandom()s will
be non-blocking.

systemd-random-seed.service is nowadays a synchronization point for
exactly the moment where the pool is considered full.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:30                                                                           ` Lennart Poettering
@ 2019-09-17 17:32                                                                             ` Willy Tarreau
  2019-09-17 17:41                                                                               ` Alexander E. Patrakov
  0 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 17:32 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Alexander E. Patrakov, Linus Torvalds, Matthew Garrett,
	Ahmed S. Darwish, Theodore Y. Ts'o, Vito Caputo,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 07:30:36PM +0200, Lennart Poettering wrote:
> On Di, 17.09.19 21:58, Alexander E. Patrakov (patrakov@gmail.com) wrote:
> 
> > I am worried that the getrandom delays will be serialized, because processes
> > sometimes run one after another. If there are enough chained/dependent
> > processes that ask for randomness before it is ready, the end result is
> > still a too-big delay, essentially a failed boot.
> >
> > In other words: your approach of adding delays only makes sense for heavily
> > parallelized boot, which may not be the case, especially for embedded
> > systems that don't like systemd.
> 
> As mentioned elsewhere: once the pool is initialized it's
> initialized. This means any pending getrandom() on the whole system
> will unblock at the same time, and from the on all getrandom()s will
> be non-blocking.

He means that all process will experience this delay until there's enough
entropy.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:32                                                                             ` Willy Tarreau
@ 2019-09-17 17:41                                                                               ` Alexander E. Patrakov
  0 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-17 17:41 UTC (permalink / raw)
  To: Willy Tarreau, Lennart Poettering
  Cc: Linus Torvalds, Matthew Garrett, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml

17.09.2019 22:32, Willy Tarreau пишет:
> On Tue, Sep 17, 2019 at 07:30:36PM +0200, Lennart Poettering wrote:
>> On Di, 17.09.19 21:58, Alexander E. Patrakov (patrakov@gmail.com) wrote:
>>
>>> I am worried that the getrandom delays will be serialized, because processes
>>> sometimes run one after another. If there are enough chained/dependent
>>> processes that ask for randomness before it is ready, the end result is
>>> still a too-big delay, essentially a failed boot.
>>>
>>> In other words: your approach of adding delays only makes sense for heavily
>>> parallelized boot, which may not be the case, especially for embedded
>>> systems that don't like systemd.
>>
>> As mentioned elsewhere: once the pool is initialized it's
>> initialized. This means any pending getrandom() on the whole system
>> will unblock at the same time, and from the on all getrandom()s will
>> be non-blocking.
> 
> He means that all process will experience this delay until there's enough
> entropy.
> 
> Willy

Indeed, my wording was not clear enough. Linus' patch has a 5-second 
timeout for small entropy requests, after which they get converted to 
the equivalent of urandom. However, in the following shell script:

#!/bin/sh
p1
p2

if both p1 and p2 ask for a small amount of entropy before crng is fully 
initialized, and do nothing that produces more entropy, the total delay 
will be 10 seconds.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:23                                                                             ` Linus Torvalds
  2019-09-17 16:34                                                                               ` Reindl Harald
@ 2019-09-17 17:42                                                                               ` Lennart Poettering
  2019-09-17 18:01                                                                                 ` Linus Torvalds
  2019-09-18 19:56                                                                                 ` Eric W. Biederman
  1 sibling, 2 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-17 17:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Di, 17.09.19 09:23, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Tue, Sep 17, 2019 at 9:08 AM Lennart Poettering <mzxreary@0pointer.de> wrote:
> >
> > Here's what I'd propose:
>
> So I think this is ok, but I have another proposal. Before I post that
> one, though, I just wanted to point out:
>
> > 1) Add GRND_INSECURE to get those users of getrandom() who do not need
> >    high quality entropy off its use (systemd has uses for this, for
> >    seeding hash tables for example), thus reducing the places where
> >    things might block.
>
> I really think that trhe logic should be the other way around.
>
> The getrandom() users that don't need high quality entropy are the
> ones that don't really think about this, and so _they_ shouldn't be
> the ones that have to explicitly state anything. To those users,
> "random is random". By definition they don't much care, and quite
> possibly they don't even know what "entropy" really means in that
> context.

So I think people nowadays prefer getrandom() over /dev/urandom
primarily because of the noisy logging the kernel does when you use
the latter on a non-initialized pool. If that'd be dropped then I am
pretty sure that the porting from /dev/urandom to getrandom() you see
in various projects (such as gdm/x11) would probably not take place.

In fact, speaking for systemd: the noisy logging in the kernel is the
primary (actually: only) reason that we prefer using RDRAND (if
available) over /dev/urandom if we need "medium quality" random
numbers, for example to seed hash tables and such. If the log message
wasn't there we wouldn't be tempted to bother with RDRAND and would
just use /dev/urandom like we used to for that.

> > 2) Add a kernel log message if a getrandom(0) client hung for 15s or
> >    more, explaining the situation briefly, but not otherwise changing
> >    behaviour.
>
> The problem is that when you have some graphical boot, you'll not even
> see the kernel messages ;(

Well, but as mentioned, there's infrastructure for this, that's why I
suggested changing systemd-random-seed.service.

We can make boot hang in "sane", discoverable way.

The reason why I think this should also be logged by the kernel since
people use netconsole and pstore and whatnot and they should see this
there. If systemd with its infrastructure brings this to screen via
plymouth then this wouldn't help people who debug much more low-level.

(I mean, there have been requests to add a logic to systemd that
refuses booting — or delays it — if the system has a battery and it is
nearly empty. I am pretty sure adding a cleanm discoverable concept of
"uh, i can't boot for a good reason which is this" wouldn't be the
worst of ideas)

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:20                                                                             ` Matthew Garrett
  2019-09-17 17:23                                                                               ` Matthew Garrett
@ 2019-09-17 17:57                                                                               ` Willy Tarreau
  1 sibling, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-17 17:57 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Linus Torvalds, Martin Steigerwald, Ahmed S. Darwish,
	Theodore Y. Ts'o, Vito Caputo, Lennart Poettering,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 06:20:02PM +0100, Matthew Garrett wrote:
> AES keys are used for a variety of long-lived purposes (eg, disk 
> encryption).

True, good point.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:42                                                                               ` Lennart Poettering
@ 2019-09-17 18:01                                                                                 ` Linus Torvalds
  2019-09-17 20:28                                                                                   ` Martin Steigerwald
  2019-09-17 20:58                                                                                   ` Linus Torvalds
  2019-09-18 19:56                                                                                 ` Eric W. Biederman
  1 sibling, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17 18:01 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Tue, Sep 17, 2019 at 10:42 AM Lennart Poettering
<mzxreary@0pointer.de> wrote:
>
> So I think people nowadays prefer getrandom() over /dev/urandom
> primarily because of the noisy logging the kernel does when you use
> the latter on a non-initialized pool. If that'd be dropped then I am
> pretty sure that the porting from /dev/urandom to getrandom() you see
> in various projects (such as gdm/x11) would probably not take place.

Sad. So people were actually are perfectly happy with urandom, but you
don't want the warning, so you use getrandom() and as a result your
boot blocks.

What a sad sad reason for a bug.

Btw, having a "I really don't care deeply about some long-term secure
key" flag would soilve that problem too. We'd happily and silently
give you data.

The only reason we _do_ that silly printout for /dev/urandom is
exactly because /dev/random wasn't useful even for the people who
_really_ wanted secure randomness, so they started using /dev/urandom
despite the fact that it didn't necessarily have any entropy at all.

So this all actually fundamentally goes back to the absolutely horrid
and entirely wrong semantics of /dev/random that made it entirely
useless for the only thing it was actually designed for.

This is also an example of how hard-line "security" people that don't
see the shades of gray in between black and white are very much part
of the problem. If you have some unreasonable hard requirements,
you're going to do the wrong thing in the end.

At some point even security people need to realize that reality isn't
black-and-white. It's not also keeping us from making any sane
progress, I feel, because of that bogus "entropy is sacred", despite
the fact that our entropy calculations are actually just random
made-up stuff (but "hey, reasonable") to begin with and aren't really
black-and-white themselves.

> In fact, speaking for systemd: the noisy logging in the kernel is the
> primary (actually: only) reason that we prefer using RDRAND (if
> available) over /dev/urandom if we need "medium quality" random
> numbers, for example to seed hash tables and such. If the log message
> wasn't there we wouldn't be tempted to bother with RDRAND and would
> just use /dev/urandom like we used to for that.

That's also very sad. If we have rdrand, we'll actually mix it into
/dev/urandom regardless, so it's again just the whole "people started
using urandom for keys because random was broken" that is the cause of
this all.

I really really detest the whole inflexible "security" mindset.

> We can make boot hang in "sane", discoverable way.

That is certainly a huge advantage, yes. Right now I suspect that what
has happened is that this has probably been going on as some low-level
background noise for a while, and people either figured it out and
switched away from gdm (example: Christoph), or more likely some
unexplained boot problems that people just didn't chase down. So it
took basically a random happenstance to make this a kernel issue.

But "easily discoverable" would be good.

> The reason why I think this should also be logged by the kernel since
> people use netconsole and pstore and whatnot and they should see this
> there. If systemd with its infrastructure brings this to screen via
> plymouth then this wouldn't help people who debug much more low-level.

Well, I certainly agree with a kernel message (including a big
WARN_ON_ONCE), but you also point out that the last time we added
helpful messages to let people know, it had some seriously unintended
consequences ;)

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 18:01                                                                                 ` Linus Torvalds
@ 2019-09-17 20:28                                                                                   ` Martin Steigerwald
  2019-09-17 20:52                                                                                     ` Ahmed S. Darwish
  2019-09-17 20:58                                                                                   ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17 20:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Ahmed S. Darwish, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Linus Torvalds - 17.09.19, 20:01:23 CEST:
> > We can make boot hang in "sane", discoverable way.
> 
> That is certainly a huge advantage, yes. Right now I suspect that what
> has happened is that this has probably been going on as some
> low-level background noise for a while, and people either figured it
> out and switched away from gdm (example: Christoph), or more likely
> some unexplained boot problems that people just didn't chase down. So
> it took basically a random happenstance to make this a kernel issue.
> 
> But "easily discoverable" would be good.

Well I meanwhile remembered how it was with sddm:

Without CPU assistance (RDRAND) or haveged or any other source of
entropy, sddm would simply not appear and I'd see the tty1 login. Then
I start to type something and after a while sddm popped up. If I would
not type anything it took easily at least have a minute till it appeared.

Actually I used my system like this quite a while, cause I did not feel
comfortable with haveged and RDRAND.

AFAIR this was as this Debian still ran with Systemd. What Debian
maintainer for sddm did was this:

sddm (0.18.0-1) unstable; urgency=medium
[…]
  [ Maximiliano Curia ]
  * Workaround entropy starvation by recommending haveged
  * Release to unstable

 -- Maximiliano Curia […]  Sun, 22 Jul 2018 13:26:44 +0200

With Sysvinit I still have neither haveged nor RDRAND enabled, but
behavior changed a bit. crng init still takes a while

% zgrep -h "crng init" /var/log/kern.log*
Sep 16 09:06:23 merkaba kernel: [   16.910096][    C3] random: crng init done
Sep  8 14:08:39 merkaba kernel: [   16.682014][    C2] random: crng init done
Sep  9 09:16:43 merkaba kernel: [   46.084188][    C2] random: crng init done
Sep 11 10:52:37 merkaba kernel: [   47.209825][    C3] random: crng init done
Sep 12 08:32:08 merkaba kernel: [   76.624375][    C3] random: crng init done
Sep 12 20:07:29 merkaba kernel: [   10.726349][    C2] random: crng init done
Sep  8 10:02:42 merkaba kernel: [   37.391577][    C2] random: crng init done
Aug 26 09:23:51 merkaba kernel: [   40.555337][    C3] random: crng init done
Aug 28 09:45:28 merkaba kernel: [   39.446847][    C1] random: crng init done
Aug 20 10:14:59 merkaba kernel: [   12.242467][    C1] random: crng init done

and there might be a slight delay before sddm appears, before tty has been
initialized. I am not completely sure whether it is related to sddm or
something else. But AFAIR delays have been in the range of a maximum of
5-10 seconds, so I did not bother to check more closely.

Note this is on a ThinkPad T520 which is a PC. And if I read above kernel log
excerpts right, it can still take up to 76 second for crng to be initialized with
entropy. Would be interesting to see other people's numbers there.

There might be a different ordering with Sysvinit and it may still be sddm.
But I never have seen a delay of 76 seconds AFAIR… so something else
might be different or I just did not notice the delay. Sometimes I switch
on the laptop and do something else to come back in a minute or so.

I don't have any kernel logs old enough to see whether whether crng init
times have been different with Systemd due to asking for randomness for
UUID/hashmaps.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 16:21                                                                           ` Willy Tarreau
  2019-09-17 17:13                                                                             ` Lennart Poettering
@ 2019-09-17 20:36                                                                             ` Martin Steigerwald
  1 sibling, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17 20:36 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Lennart Poettering, Theodore Y. Ts'o, Matthew Garrett,
	Linus Torvalds, Ahmed S. Darwish, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Willy Tarreau - 17.09.19, 18:21:37 CEST:
> On Tue, Sep 17, 2019 at 05:57:43PM +0200, Lennart Poettering wrote:
> > Note that calling getrandom(0) "too early" is not something people
> > do
> > on purpose. It happens by accident, i.e. because we live in a world
> > where SSH or HTTPS or so is run in the initrd already, and in a
> > world
> > where booting sometimes can be very very fast.
> 
> It's not an accident, it's a lack of understanding of the impacts
> from the people who package the systems. Generating an SSH key from
> an initramfs without thinking where the randomness used for this could
> come from is not accidental, it's a lack of experience that will be
> fixed once they start to collect such reports. And those who
> absolutely need their SSH daemon or HTTPS server for a recovery image
> in initramfs can very well feed fake entropy by dumping whatever they
> want into /dev/random to make it possible to build temporary keys for
> use within this single session. At least all supposedly incorrect use
> will be made *on purpose* and will still be possible to match what
> users need.

Well I wondered before whether SSH key generation for cloud init or 
other automatically individualized systems could happen in the 
background. Replacing a key that would be there before it would be 
replaced. So SSH would be available *before* the key is regenerated. But 
then there are those big fast man in the middle warnings… and I have no 
clear idea to handle this in a way that would both be secure and not 
scare users off too much.

Well probably systems at some point better have good entropy very 
quickly… and that is it. (And then quantum computers may crack those 
good keys anyway in the future.)

-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:29                                                                               ` Willy Tarreau
@ 2019-09-17 20:42                                                                                 ` Martin Steigerwald
  2019-09-18 13:38                                                                                 ` Lennart Poettering
  1 sibling, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17 20:42 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Lennart Poettering, Theodore Y. Ts'o, Matthew Garrett,
	Linus Torvalds, Ahmed S. Darwish, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Willy Tarreau - 17.09.19, 19:29:29 CEST:
> On Tue, Sep 17, 2019 at 07:13:28PM +0200, Lennart Poettering wrote:
> > On Di, 17.09.19 18:21, Willy Tarreau (w@1wt.eu) wrote:
> > > On Tue, Sep 17, 2019 at 05:57:43PM +0200, Lennart Poettering 
> > > wrote:
> > > > Note that calling getrandom(0) "too early" is not something
> > > > people do
> > > > on purpose. It happens by accident, i.e. because we live in a
> > > > world
> > > > where SSH or HTTPS or so is run in the initrd already, and in a
> > > > world
> > > > where booting sometimes can be very very fast.
> > > 
> > > It's not an accident, it's a lack of understanding of the impacts
> > > from the people who package the systems. Generating an SSH key
> > > from
> > > an initramfs without thinking where the randomness used for this
> > > could come from is not accidental, it's a lack of experience that
> > > will be fixed once they start to collect such reports. And those
> > > who absolutely need their SSH daemon or HTTPS server for a
> > > recovery image in initramfs can very well feed fake entropy by
> > > dumping whatever they want into /dev/random to make it possible
> > > to build temporary keys for use within this single session. At
> > > least all supposedly incorrect use will be made *on purpose* and
> > > will still be possible to match what users need.> 
> > What do you expect these systems to do though?
> > 
> > I mean, think about general purpose distros: they put together live
> > images that are supposed to work on a myriad of similar (as in: same
> > arch) but otherwise very different systems (i.e. VMs that might lack
> > any form of RNG source the same as beefy servers with muliple
> > sources
> > the same as older netbooks with few and crappy sources, ...). They
> > can't know what the specific hw will provide or won't. It's not
> > their incompetence that they build the image like that. It's a
> > common, very common usecase to install a system via SSH, and it's
> > also very common to have very generic images for a large number
> > varied systems to run on.
> 
> I'm totally file with installing the system via SSH, using a temporary
> SSH key. I do make a strong distinction between the installation
> phase and the final deployment. The SSH key used *for installation*
> doesn't need to the be same as the final one. And very often at the
> end of the installation we'll have produced enough entropy to produce
> a correct key.

Well… systems cloud-init adapts may come from the same template. Cloud 
Init thus replaces the key that has been there before on their first 
boot. There is no "installation".

Cloud Init could replace the key in the background… and restart SSH 
then… but that will give those big fat man in the middle warnings and 
all systems would use the same SSH host key initially. I just don't see 
a good way at the moment how to handle this. Introducing an SSH mode for 
this is still a temporary not so random key with proper warnings might 
be challenging to get right from both a security and usability point of 
view. And it would add complexity.

That said with Proxmox VE on Fujitsu S8 or Intel NUCs I have never seen 
this issue even when starting 50 VMs in a row, however, with large cloud 
providers starting 50 VMs in a row does not sound like all that much. 
And I bet with Proxmox VE virtio rng is easily available cause it uses 
KVM.

-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 20:28                                                                                   ` Martin Steigerwald
@ 2019-09-17 20:52                                                                                     ` Ahmed S. Darwish
  2019-09-17 21:38                                                                                       ` Martin Steigerwald
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-17 20:52 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 10:28:47PM +0200, Martin Steigerwald wrote:
[...]
> 
> I don't have any kernel logs old enough to see whether whether crng init
> times have been different with Systemd due to asking for randomness for
> UUID/hashmaps.
>

Please stop claiming this. It has been pointed out to you, __multiple
times__, that this makes no difference. For example:

    https://lkml.kernel.org/r/20190916024904.GA22035@mit.edu
    
    No. getrandom(2) uses the new CRNG, which is either initialized,
    or it's not ... So to the extent that systemd has made systems
    boot faster, you could call that systemd's "fault".

You've claimed this like 3 times before in this thread already, and
multiple people replied with the same response. If you don't get the
paragraph above, then please don't continue replying further on this
thread.

thanks,

-- 
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 18:01                                                                                 ` Linus Torvalds
  2019-09-17 20:28                                                                                   ` Martin Steigerwald
@ 2019-09-17 20:58                                                                                   ` Linus Torvalds
  2019-09-18  9:33                                                                                     ` Rasmus Villemoes
  1 sibling, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17 20:58 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

Side note, and entirely unrelated to this particular problem, but
_because_ I was looking at the entropy init and sources of randomness
we have, I notice that we still don't use the ToD clock as a source.

There's not a whole lot of bits there, but at least one of the attacks
against entirely missing boot-time randomness was to look at the
output of get_random_bytes(), and just compare it across machines. We
sanitize things by going through a cryptographic hash function, but
that helps hide the internal entropy buffers from direct viewing, but
it still leaves the "are those internal entropy buffers the _same_
across machines" for the nasty embedded hardware case with identical
hardware.

Of course, some of those machines didn't even have a a time-of-day
clock either. But the fact that some didn't doesn't mean we shouldn't
take it into account.

So adding a "add_device_randomness()" to do_settimeofday64() (which
catches them all) wouldn't be a bad idea. Not perhaps "entropy", but
helping against detecting the case of basically very limited entropy
at all at early boot.

I'm pretty sure we discussed that case when we did those things
originally, but I don't actually see us doing it anywhere right now.

So we definitely have some sources of differences for different
systems that we could/should use, even if we might not be able to
really account them as "entropy". The whole "people generated a number
of the same keys" is just horrendously bad, even if they were to use
/dev/urandom that doesn't have any strict entropy guarantees.

               Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 20:52                                                                                     ` Ahmed S. Darwish
@ 2019-09-17 21:38                                                                                       ` Martin Steigerwald
  2019-09-17 21:52                                                                                         ` Matthew Garrett
  2019-09-18 13:40                                                                                         ` Lennart Poettering
  0 siblings, 2 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17 21:38 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Ahmed S. Darwish - 17.09.19, 22:52:34 CEST:
> On Tue, Sep 17, 2019 at 10:28:47PM +0200, Martin Steigerwald wrote:
> [...]
> 
> > I don't have any kernel logs old enough to see whether whether crng
> > init times have been different with Systemd due to asking for
> > randomness for UUID/hashmaps.
> 
> Please stop claiming this. It has been pointed out to you, __multiple
> times__, that this makes no difference. For example:
> 
>     https://lkml.kernel.org/r/20190916024904.GA22035@mit.edu
> 
>     No. getrandom(2) uses the new CRNG, which is either initialized,
>     or it's not ... So to the extent that systemd has made systems
>     boot faster, you could call that systemd's "fault".
> 
> You've claimed this like 3 times before in this thread already, and
> multiple people replied with the same response. If you don't get the
> paragraph above, then please don't continue replying further on this
> thread.

First off, this mail you referenced has not been an answer to a mail of 
mine. It does not have my mail address in Cc. So no, it has not been 
pointed out directly to me in that mail.

Secondly: Pardon me, but I do not see how asking for entropy early at 
boot times or not doing so has *no effect* on the available entropy¹. And 
I do not see the above mail actually saying this. To my knowledge 
Sysvinit does not need entropy for itself². The above mail merely talks 
about the blocking on boot. And whether systemd-random-seed would drain 
entropy, not whether hashmaps/UUID do. And also not the effect that 
asking for entropy early has on the available entropy and on the 
*initial* initialization time of the new CRNG. However I did not claim 
that Systemd would block booting. *Not at all*.

Thirdly: I disagree with the tone you use in your mail. And for that 
alone I feel it may be better for me to let go of this discussion.

My understanding of entropy always has been that only a certain amount 
of it can be produced in a certain amount of time. If that is wrong… 
please by all means, please teach me, how it would be.

However I am not even claiming anything. All I wrote above is that I do 
not have any measurements. But I'd expect that the more entropy is asked 
for early during boot, the longer the initial initialization of the new 
CRNG will take. And if someone else relies on this initialization, that 
something else would block for a longer time.

I got that it the new crng won't block after that anymore.

[1] https://github.com/systemd/systemd/issues/4167

(I know that it still with /dev/urandom, so if it is using RDRAND now, 
this may indeed be different, but would it then deplete entropy the CPU 
has available and that by default is fed into the Linux crng as well 
(even without trusting it completely)?)

[2] According to

https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html

sysvinit does not contain a single line of code about entropy or random 
numbers.

Daniel even updated his blog post with a hint to this discussion.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 21:38                                                                                       ` Martin Steigerwald
@ 2019-09-17 21:52                                                                                         ` Matthew Garrett
  2019-09-17 22:10                                                                                           ` Martin Steigerwald
  2019-09-17 23:08                                                                                           ` Linus Torvalds
  2019-09-18 13:40                                                                                         ` Lennart Poettering
  1 sibling, 2 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-17 21:52 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Ahmed S. Darwish, Linus Torvalds, Lennart Poettering,
	Theodore Y. Ts'o, Willy Tarreau, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 11:38:33PM +0200, Martin Steigerwald wrote:

> My understanding of entropy always has been that only a certain amount 
> of it can be produced in a certain amount of time. If that is wrong… 
> please by all means, please teach me, how it would be.

getrandom() will never "consume entropy" in a way that will block any 
users of getrandom(). If you don't have enough collected entropy to seed 
the rng, getrandom() will block. If you do, getrandom() will generate as 
many numbers as you ask it to, even if no more entropy is ever collected 
by the system. So it doesn't matter how many clients you have calling 
getrandom() in the boot process - either there'll be enough entropy 
available to satisfy all of them, or there'll be too little to satisfy 
any of them.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 21:52                                                                                         ` Matthew Garrett
@ 2019-09-17 22:10                                                                                           ` Martin Steigerwald
  2019-09-18 13:53                                                                                             ` Lennart Poettering
  2019-09-17 23:08                                                                                           ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-17 22:10 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Ahmed S. Darwish, Linus Torvalds, Lennart Poettering,
	Theodore Y. Ts'o, Willy Tarreau, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Matthew Garrett - 17.09.19, 23:52:00 CEST:
> On Tue, Sep 17, 2019 at 11:38:33PM +0200, Martin Steigerwald wrote:
> > My understanding of entropy always has been that only a certain
> > amount of it can be produced in a certain amount of time. If that
> > is wrong… please by all means, please teach me, how it would be.
> 
> getrandom() will never "consume entropy" in a way that will block any
> users of getrandom(). If you don't have enough collected entropy to
> seed the rng, getrandom() will block. If you do, getrandom() will
> generate as many numbers as you ask it to, even if no more entropy is
> ever collected by the system. So it doesn't matter how many clients
> you have calling getrandom() in the boot process - either there'll be
> enough entropy available to satisfy all of them, or there'll be too
> little to satisfy any of them.

Right, but then Systemd would not use getrandom() for initial hashmap/
UUID stuff since it

1) would block boot very early then, which is not desirable and

2) it does not need strong random numbers anyway.

At least that is how I understood Lennart's comments on the Systemd bug 
report I referenced.

AFAIK hashmap/UUID stuff uses *some* entropy *before* crng has been 
seeded with entropy and all I wondered was whether this using *some* 
entropy *before* crng has been seeded – by /dev/urandom initially, but 
now as far as I got with RDRAND if available – will delay the process of 
gathering the entropy  necessary to seed crng… if that is the case then 
anything that uses crng during or soon after boot, like gdm, sddm, 
OpenSSH ssh-keygen will be blocked for a longer time will the initial 
seeding of crng has been done.

Of course if hashmap/UUID stuff does not use any entropy that would be 
required for the *initial* seeding or crng, then… that would not be the 
case. But from what I understood, it does.

And yes, for "systemd-random-seed" it is true that it does not drain 
entropy for getrandom, cause it writes the seed to disk *after* crng has 
been initialized, i.e. at a time where getrandom would never block again 
as long as the system is running.

If I am still completely misunderstanding something there, then it may 
be better to go to sleep. Which I will do now anyway.

Or I may just not be very good at explaining what I mean.

-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 21:52                                                                                         ` Matthew Garrett
  2019-09-17 22:10                                                                                           ` Martin Steigerwald
@ 2019-09-17 23:08                                                                                           ` Linus Torvalds
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-17 23:08 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Martin Steigerwald, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Willy Tarreau, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Tue, Sep 17, 2019 at 2:52 PM Matthew Garrett <mjg59@srcf.ucam.org> wrote:
>
> getrandom() will never "consume entropy" in a way that will block any
> users of getrandom().

Yes, this is true for any common and sane use.

And by that I just mean that we do have GRND_RANDOM, which currently
does exactly that entropy consumption.

But it only consumes it for other GRND_RANDOM users - of which there
are approximately zero, because nobody wants that rats nest.

                Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 20:58                                                                                   ` Linus Torvalds
@ 2019-09-18  9:33                                                                                     ` Rasmus Villemoes
  2019-09-18 10:16                                                                                       ` Willy Tarreau
  2019-09-18 19:31                                                                                       ` Linus Torvalds
  0 siblings, 2 replies; 211+ messages in thread
From: Rasmus Villemoes @ 2019-09-18  9:33 UTC (permalink / raw)
  To: Linus Torvalds, Lennart Poettering
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Willy Tarreau,
	Matthew Garrett, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On 17/09/2019 22.58, Linus Torvalds wrote:
> Side note, and entirely unrelated to this particular problem, but
> _because_ I was looking at the entropy init and sources of randomness
> we have, I notice that we still don't use the ToD clock as a source.

And unrelated to the non-use of the RTC (which I agree seems weird), but
because there's no better place in this thread: How "random" is the
contents of RAM after boot? Sure, for virtualized environments one
probably always gets zeroed pages from the host (otherwise the host has
a problem...), and on PCs maybe the BIOS interferes.

But for cheap embedded devices with non-ECC RAM and not a lot of
value-add firmware between power-on and start_kernel(), would it make
sense to read a few MB of memory outside of where the kernel was loaded
and feed those to add_device_randomness() (of course, doing it as early
as possible, maybe first thing in start_kernel())? Or do the reading in
the bootloader and pass on the sha256() in the DT/rng-seed property?

A quick "kitchen-table" experiment with the board I have on my desk
shows that there are at least some randomness to be had after a cold boot.

Maybe this has already been suggested and rejected?

Rasmus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18  9:33                                                                                     ` Rasmus Villemoes
@ 2019-09-18 10:16                                                                                       ` Willy Tarreau
  2019-09-18 10:25                                                                                         ` Alexander E. Patrakov
  2019-09-18 19:31                                                                                       ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-18 10:16 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Linus Torvalds, Lennart Poettering, Ahmed S. Darwish,
	Theodore Y. Ts'o, Matthew Garrett, Vito Caputo,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	Alexander E. Patrakov, zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 11:33:39AM +0200, Rasmus Villemoes wrote:
> On 17/09/2019 22.58, Linus Torvalds wrote:
> > Side note, and entirely unrelated to this particular problem, but
> > _because_ I was looking at the entropy init and sources of randomness
> > we have, I notice that we still don't use the ToD clock as a source.
> 
> And unrelated to the non-use of the RTC (which I agree seems weird), but
> because there's no better place in this thread: How "random" is the
> contents of RAM after boot? Sure, for virtualized environments one
> probably always gets zeroed pages from the host (otherwise the host has
> a problem...), and on PCs maybe the BIOS interferes.
>
> But for cheap embedded devices with non-ECC RAM and not a lot of
> value-add firmware between power-on and start_kernel(), would it make
> sense to read a few MB of memory outside of where the kernel was loaded
> and feed those to add_device_randomness() (of course, doing it as early
> as possible, maybe first thing in start_kernel())? Or do the reading in
> the bootloader and pass on the sha256() in the DT/rng-seed property?
> 
> A quick "kitchen-table" experiment with the board I have on my desk
> shows that there are at least some randomness to be had after a cold boot.
> 
> Maybe this has already been suggested and rejected?

We've already discussed that point a few times. The issue is that
bootloaders and/or BIOSes tend to wipe everything. Ideally we should
let the boot loader collect entropy from the DDR training phase since
it's a period where noise is observed. It's also the right moment to
collect some random contents that may lie in the RAM cells.

Similarly asynchronous clocks driving external components can be used
as well if you can measure their phase with the CPU's clock.

Regards,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 10:16                                                                                       ` Willy Tarreau
@ 2019-09-18 10:25                                                                                         ` Alexander E. Patrakov
  2019-09-18 10:42                                                                                           ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-18 10:25 UTC (permalink / raw)
  To: Willy Tarreau, Rasmus Villemoes
  Cc: Linus Torvalds, Lennart Poettering, Ahmed S. Darwish,
	Theodore Y. Ts'o, Matthew Garrett, Vito Caputo,
	Andreas Dilger, Jan Kara, Ray Strode, William Jon McCann,
	zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

18.09.2019 15:16, Willy Tarreau пишет:
> We've already discussed that point a few times. The issue is that
> bootloaders and/or BIOSes tend to wipe everything. Ideally we should
> let the boot loader collect entropy from the DDR training phase since
> it's a period where noise is observed. It's also the right moment to
> collect some random contents that may lie in the RAM cells.
> 
> Similarly asynchronous clocks driving external components can be used
> as well if you can measure their phase with the CPU's clock.

This does not correspond to my own observations. I have a setup where a 
secondary key is saved into RAM for unlocking a LUKS container after a 
reboot. It is documented by me (sorry, in Russian only) here: 
https://habr.com/ru/post/457396/ , will publish an English translation 
in my blog if I get at least one request (in private email, please).

The results so far are:

1. Desktop with MSI Z87I board: works.
2. Lenovo Yoga 2 Pro laptop: works.
3. Server based on the Intel Corporation S1200SPL board (available from 
OVH as EG-32): does not work, memory is cleared.
4. Cheap server based on Gooxi G1SCN-B board (the cheapes thing with 
IPMI available on bacloud.com): works.

So that's 75% of success stories (found at least one page that is 
preserved after the "reboot" command) based on my samples.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 10:25                                                                                         ` Alexander E. Patrakov
@ 2019-09-18 10:42                                                                                           ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-18 10:42 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Rasmus Villemoes, Linus Torvalds, Lennart Poettering,
	Ahmed S. Darwish, Theodore Y. Ts'o, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 03:25:51PM +0500, Alexander E. Patrakov wrote:
> The results so far are:
> 
> 1. Desktop with MSI Z87I board: works.
> 2. Lenovo Yoga 2 Pro laptop: works.
> 3. Server based on the Intel Corporation S1200SPL board (available from OVH
> as EG-32): does not work, memory is cleared.
> 4. Cheap server based on Gooxi G1SCN-B board (the cheapes thing with IPMI
> available on bacloud.com): works.
> 
> So that's 75% of success stories (found at least one page that is preserved
> after the "reboot" command) based on my samples.

That's pretty good! I didn't have this luck each time I tried this in
the past :-/ I remember noticing that video RAM from graphics card was
often usable however, which I figured I could use after seeing a ghost
image from a previous boot when switching to graphics mode.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:29                                                                               ` Willy Tarreau
  2019-09-17 20:42                                                                                 ` Martin Steigerwald
@ 2019-09-18 13:38                                                                                 ` Lennart Poettering
  2019-09-18 13:59                                                                                   ` Alexander E. Patrakov
  1 sibling, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-18 13:38 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, lkml

On Di, 17.09.19 19:29, Willy Tarreau (w@1wt.eu) wrote:

> > What do you expect these systems to do though?
> >
> > I mean, think about general purpose distros: they put together live
> > images that are supposed to work on a myriad of similar (as in: same
> > arch) but otherwise very different systems (i.e. VMs that might lack
> > any form of RNG source the same as beefy servers with muliple sources
> > the same as older netbooks with few and crappy sources, ...). They can't
> > know what the specific hw will provide or won't. It's not their
> > incompetence that they build the image like that. It's a common, very
> > common usecase to install a system via SSH, and it's also very common
> > to have very generic images for a large number varied systems to run
> > on.
>
> I'm totally file with installing the system via SSH, using a temporary
> SSH key. I do make a strong distinction between the installation phase
> and the final deployment. The SSH key used *for installation* doesn't
> need to the be same as the final one. And very often at the end of the
> installation we'll have produced enough entropy to produce a correct
> key.

That's not how systems are built today though. And I am not sure they
should be. I mean, the majority of systems at this point probably have
some form of hardware (or virtualized) RNG available (even raspi has
one these days!), so generating these keys once at boot is totally
OK. Probably a number of others need just a few seconds to get the
entropy needed, where things are totally OK too. The only problem is
systems that lack any reasonable source of entropy and where
initialization of the pool will take overly long.

I figure we can reduce the number of systems where entropy is scarce
quite a bit if we'd start crediting entropy by default from various hw
rngs we currently don't credit entropy for. For example, the TPM and
older intel/amd chipsets. You currently have to specify
rng_core.default_quality=1000 on the kernel cmdline to make them
credit entropy. I am pretty sure this should be the default now, in a
world where CONFIG_RANDOM_TRUST_CPU=y is set anyway. i.e. why say
RDRAND is fine but those chipsets are not? That makes no sense to me.

I am very sure that crediting entropy to chipset hwrngs is a much
better way to solve the issue on those systems than to just hand out
rubbish randomness.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 21:38                                                                                       ` Martin Steigerwald
  2019-09-17 21:52                                                                                         ` Matthew Garrett
@ 2019-09-18 13:40                                                                                         ` Lennart Poettering
  1 sibling, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-18 13:40 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Ahmed S. Darwish, Linus Torvalds, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Di, 17.09.19 23:38, Martin Steigerwald (martin@lichtvoll.de) wrote:

> (I know that it still with /dev/urandom, so if it is using RDRAND now,
> this may indeed be different, but would it then deplete entropy the CPU
> has available and that by default is fed into the Linux crng as well
> (even without trusting it completely)?)

Neither RDRAND nor /dev/urandom know a concept of "depleting
entropy". That concept does not exist for them. It does exist for
/dev/random, but only crazy people use that. systemd does not.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 22:10                                                                                           ` Martin Steigerwald
@ 2019-09-18 13:53                                                                                             ` Lennart Poettering
  2019-09-19  7:28                                                                                               ` Martin Steigerwald
  0 siblings, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-18 13:53 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Matthew Garrett, Ahmed S. Darwish, Linus Torvalds,
	Theodore Y. Ts'o, Willy Tarreau, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Mi, 18.09.19 00:10, Martin Steigerwald (martin@lichtvoll.de) wrote:

> > getrandom() will never "consume entropy" in a way that will block any
> > users of getrandom(). If you don't have enough collected entropy to
> > seed the rng, getrandom() will block. If you do, getrandom() will
> > generate as many numbers as you ask it to, even if no more entropy is
> > ever collected by the system. So it doesn't matter how many clients
> > you have calling getrandom() in the boot process - either there'll be
> > enough entropy available to satisfy all of them, or there'll be too
> > little to satisfy any of them.
>
> Right, but then Systemd would not use getrandom() for initial hashmap/
> UUID stuff since it

Actually things are more complex. In systemd there are four classes of
random values we need:

1. High "cryptographic" quality. There are very few needs for this in
   systemd, as we do very little in this area. It's basically only
   used for generating salt values for hashed passwords, in the
   systemd-firstboot component, which can be used to set the root
   pw. systemd uses synchronous getrandom() for this. It does not use
   RDRAND for this.

2. High "non-cryptographic" quality. This is used for example for
   generating type 4 uuids, i.e uuids that are supposed to be globally
   unique, but aren't key material. We use RDRAND for this if
   available, falling back to synchronous getrandom(). Type 3 UUIDs
   are frequently needed by systemd, as we assign a uuid to each
   service invocation implicitly, so that people can match logging
   data and such to a specific instance and runtime of a service.

3. Medium quality. This is used for seeding hash tables. These may be
   crap initially, but should not be guessable in the long
   run. /dev/urandom would be perfect for this, but the mentioned log
   message sucks, hence we use RDRAND for this if available, and fall
   back to /dev/urandom if that isn't available, accepting the log
   message.

4. Crap quality. There are only a few uses of this, where rand_r() is
   is OK.

Of these four case, the first two might block boot. Because the first
case is not common you won't see blocking that often though for
them. The second case is very common, but since we use RDRAND you
won't see it on any recent Intel machines.

Or to say this all differently: the hash table seeding and the uuid
case are two distinct cases in systemd, and I am sure they should be.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 13:38                                                                                 ` Lennart Poettering
@ 2019-09-18 13:59                                                                                   ` Alexander E. Patrakov
  2019-09-18 14:50                                                                                     ` Alexander E. Patrakov
  0 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-18 13:59 UTC (permalink / raw)
  To: Lennart Poettering, Willy Tarreau
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 2901 bytes --]

18.09.2019 18:38, Lennart Poettering пишет:
> On Di, 17.09.19 19:29, Willy Tarreau (w@1wt.eu) wrote:
> 
>>> What do you expect these systems to do though?
>>>
>>> I mean, think about general purpose distros: they put together live
>>> images that are supposed to work on a myriad of similar (as in: same
>>> arch) but otherwise very different systems (i.e. VMs that might lack
>>> any form of RNG source the same as beefy servers with muliple sources
>>> the same as older netbooks with few and crappy sources, ...). They can't
>>> know what the specific hw will provide or won't. It's not their
>>> incompetence that they build the image like that. It's a common, very
>>> common usecase to install a system via SSH, and it's also very common
>>> to have very generic images for a large number varied systems to run
>>> on.
>>
>> I'm totally file with installing the system via SSH, using a temporary
>> SSH key. I do make a strong distinction between the installation phase
>> and the final deployment. The SSH key used *for installation* doesn't
>> need to the be same as the final one. And very often at the end of the
>> installation we'll have produced enough entropy to produce a correct
>> key.
> 
> That's not how systems are built today though. And I am not sure they
> should be. I mean, the majority of systems at this point probably have
> some form of hardware (or virtualized) RNG available (even raspi has
> one these days!), so generating these keys once at boot is totally
> OK. Probably a number of others need just a few seconds to get the
> entropy needed, where things are totally OK too. The only problem is
> systems that lack any reasonable source of entropy and where
> initialization of the pool will take overly long.
> 
> I figure we can reduce the number of systems where entropy is scarce
> quite a bit if we'd start crediting entropy by default from various hw
> rngs we currently don't credit entropy for. For example, the TPM and
> older intel/amd chipsets. You currently have to specify
> rng_core.default_quality=1000 on the kernel cmdline to make them
> credit entropy. I am pretty sure this should be the default now, in a
> world where CONFIG_RANDOM_TRUST_CPU=y is set anyway. i.e. why say
> RDRAND is fine but those chipsets are not? That makes no sense to me.
> 
> I am very sure that crediting entropy to chipset hwrngs is a much
> better way to solve the issue on those systems than to just hand out
> rubbish randomness.

Very well said. However, 1000 is more than the hard-coded quality of 
some existing rngs, and so would send a misleading message that they are 
somehow worse. I would suggest case-by-case reevaluation of all existing 
hwrng drivers by their maintainers, and then setting the default to 
something like 899, so that evaluated drivers have priority.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 13:59                                                                                   ` Alexander E. Patrakov
@ 2019-09-18 14:50                                                                                     ` Alexander E. Patrakov
  0 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-18 14:50 UTC (permalink / raw)
  To: Lennart Poettering, Willy Tarreau
  Cc: Theodore Y. Ts'o, Matthew Garrett, Linus Torvalds,
	Ahmed S. Darwish, Vito Caputo, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, zhangjs, linux-ext4, lkml


[-- Attachment #1: Type: text/plain, Size: 3439 bytes --]

18.09.2019 18:59, Alexander E. Patrakov пишет:
> 18.09.2019 18:38, Lennart Poettering пишет:
>> On Di, 17.09.19 19:29, Willy Tarreau (w@1wt.eu) wrote:
>>
>>>> What do you expect these systems to do though?
>>>>
>>>> I mean, think about general purpose distros: they put together live
>>>> images that are supposed to work on a myriad of similar (as in: same
>>>> arch) but otherwise very different systems (i.e. VMs that might lack
>>>> any form of RNG source the same as beefy servers with muliple sources
>>>> the same as older netbooks with few and crappy sources, ...). They 
>>>> can't
>>>> know what the specific hw will provide or won't. It's not their
>>>> incompetence that they build the image like that. It's a common, very
>>>> common usecase to install a system via SSH, and it's also very common
>>>> to have very generic images for a large number varied systems to run
>>>> on.
>>>
>>> I'm totally file with installing the system via SSH, using a temporary
>>> SSH key. I do make a strong distinction between the installation phase
>>> and the final deployment. The SSH key used *for installation* doesn't
>>> need to the be same as the final one. And very often at the end of the
>>> installation we'll have produced enough entropy to produce a correct
>>> key.
>>
>> That's not how systems are built today though. And I am not sure they
>> should be. I mean, the majority of systems at this point probably have
>> some form of hardware (or virtualized) RNG available (even raspi has
>> one these days!), so generating these keys once at boot is totally
>> OK. Probably a number of others need just a few seconds to get the
>> entropy needed, where things are totally OK too. The only problem is
>> systems that lack any reasonable source of entropy and where
>> initialization of the pool will take overly long.
>>
>> I figure we can reduce the number of systems where entropy is scarce
>> quite a bit if we'd start crediting entropy by default from various hw
>> rngs we currently don't credit entropy for. For example, the TPM and
>> older intel/amd chipsets. You currently have to specify
>> rng_core.default_quality=1000 on the kernel cmdline to make them
>> credit entropy. I am pretty sure this should be the default now, in a
>> world where CONFIG_RANDOM_TRUST_CPU=y is set anyway. i.e. why say
>> RDRAND is fine but those chipsets are not? That makes no sense to me.
>>
>> I am very sure that crediting entropy to chipset hwrngs is a much
>> better way to solve the issue on those systems than to just hand out
>> rubbish randomness.
> 
> Very well said. However, 1000 is more than the hard-coded quality of 
> some existing rngs, and so would send a misleading message that they are 
> somehow worse. I would suggest case-by-case reevaluation of all existing 
> hwrng drivers by their maintainers, and then setting the default to 
> something like 899, so that evaluated drivers have priority.
> 

Well, I have to provide another data point. On Arch Linux and MSI Z87I 
desktop board:

$ lsmod | grep rng
<nothing>
$ modinfo rng_core
<yes, the module does exist>

So this particular board has no sources of randomness except interrupts 
(which are scarce), RDRAND (which is not trusted in Arch Linux by 
default) and jitter entropy (which is not collected by the kernel and 
needs haveged or equivalent).

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18  9:33                                                                                     ` Rasmus Villemoes
  2019-09-18 10:16                                                                                       ` Willy Tarreau
@ 2019-09-18 19:31                                                                                       ` Linus Torvalds
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-18 19:31 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Lennart Poettering, Ahmed S. Darwish, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 2:33 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> And unrelated to the non-use of the RTC (which I agree seems weird), but
> because there's no better place in this thread: How "random" is the
> contents of RAM after boot?

It varies all over the place.

Some machines will most definitely clear it at each boot.

Others will clear it on cold boots but not warm boots.

Yet other environments never clear it at all, or leave it with odd patterns.

So it _could_ be useful as added input to the initial random state,
but it equally well might be totally pointless. It's really hard to
even guess.

There would be nothing wrong by trying to do add_device_randomness()
from some unused-at-boot memory area, but it's unclear what memory
area you should even attempt to use. Certainly not beginning of RAM or
end of RAM, which are both special and more likely to have been used
by the boot sequence even if it is then marked as unused in the memory
maps.

And if you do it, it's not clear it will add any noise at all. It
_might_. But it might equally well not.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-17 17:42                                                                               ` Lennart Poettering
  2019-09-17 18:01                                                                                 ` Linus Torvalds
@ 2019-09-18 19:56                                                                                 ` Eric W. Biederman
  2019-09-18 20:13                                                                                   ` Linus Torvalds
  2019-09-18 20:15                                                                                   ` Alexander E. Patrakov
  1 sibling, 2 replies; 211+ messages in thread
From: Eric W. Biederman @ 2019-09-18 19:56 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Linus Torvalds, Ahmed S. Darwish, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Lennart Poettering <mzxreary@0pointer.de> writes:

> On Di, 17.09.19 09:23, Linus Torvalds (torvalds@linux-foundation.org) wrote:
>
>> On Tue, Sep 17, 2019 at 9:08 AM Lennart Poettering <mzxreary@0pointer.de> wrote:
>> >
>> > Here's what I'd propose:
>>
>> So I think this is ok, but I have another proposal. Before I post that
>> one, though, I just wanted to point out:
>>
>> > 1) Add GRND_INSECURE to get those users of getrandom() who do not need
>> >    high quality entropy off its use (systemd has uses for this, for
>> >    seeding hash tables for example), thus reducing the places where
>> >    things might block.
>>
>> I really think that trhe logic should be the other way around.
>>
>> The getrandom() users that don't need high quality entropy are the
>> ones that don't really think about this, and so _they_ shouldn't be
>> the ones that have to explicitly state anything. To those users,
>> "random is random". By definition they don't much care, and quite
>> possibly they don't even know what "entropy" really means in that
>> context.
>
> So I think people nowadays prefer getrandom() over /dev/urandom
> primarily because of the noisy logging the kernel does when you use
> the latter on a non-initialized pool. If that'd be dropped then I am
> pretty sure that the porting from /dev/urandom to getrandom() you see
> in various projects (such as gdm/x11) would probably not take place.
>
> In fact, speaking for systemd: the noisy logging in the kernel is the
> primary (actually: only) reason that we prefer using RDRAND (if
> available) over /dev/urandom if we need "medium quality" random
> numbers, for example to seed hash tables and such. If the log message
> wasn't there we wouldn't be tempted to bother with RDRAND and would
> just use /dev/urandom like we used to for that.
>
>> > 2) Add a kernel log message if a getrandom(0) client hung for 15s or
>> >    more, explaining the situation briefly, but not otherwise changing
>> >    behaviour.
>>
>> The problem is that when you have some graphical boot, you'll not even
>> see the kernel messages ;(
>
> Well, but as mentioned, there's infrastructure for this, that's why I
> suggested changing systemd-random-seed.service.
>
> We can make boot hang in "sane", discoverable way.
>
> The reason why I think this should also be logged by the kernel since
> people use netconsole and pstore and whatnot and they should see this
> there. If systemd with its infrastructure brings this to screen via
> plymouth then this wouldn't help people who debug much more low-level.
>
> (I mean, there have been requests to add a logic to systemd that
> refuses booting — or delays it — if the system has a battery and it is
> nearly empty. I am pretty sure adding a cleanm discoverable concept of
> "uh, i can't boot for a good reason which is this" wouldn't be the
> worst of ideas)

As I understand it the deep problem is that sometimes we have not
observed enough random activity early in boot.

The cheap solution appears to be copying a random seed from a previous
boot, and I think that will take care of many many cases, and has
already been implemented.  Which reduces this to a system first
boot issue.

So for first system boot can we take some special actions to make
it possible to see randomness sooner.  An unconditional filesystem check
of the filesystem perhaps.  Something that will initiate disk activity
or other hardware activity that will generate interrupts and allow
us to capture randomness.

For many systems we could even have the installer capture some random
data as a final stage of the installation, and use that to seed
randomness on the first boot.

Somewhere in installing the random seed we need to be careful about
people just copying disk images from one system to another, and a
replicated seed probably can not be considered very random.

My sense is that by copying a random seed from one boot to the next
and by initiating system activity to hurry along the process of
having enough randomness we can have systems where we can almost
always have good random numbers available.

And if we almost always have good random numbers available we won't
have to worry about people getting this wrong.

Am I wrong or can we just solve random number availablity is practically
all cases?

Eric

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 19:56                                                                                 ` Eric W. Biederman
@ 2019-09-18 20:13                                                                                   ` Linus Torvalds
  2019-09-18 20:15                                                                                   ` Alexander E. Patrakov
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-18 20:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Lennart Poettering, Ahmed S. Darwish, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 12:56 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> The cheap solution appears to be copying a random seed from a previous
> boot, and I think that will take care of many many cases, and has
> already been implemented.  Which reduces this to a system first
> boot issue.

Not really.

Part of the problem is that many people don't _trust_ that "previous
boot entropy".

The lack of trust is sometimes fundamental mistrust ("Who knows where
it came from"), which also tends to cover things like not trusting
rdrand or not trusting the boot loader claimed randomness data.

But the lack of trust has been realistic - if you generated your disk
image by cloning a pre-existing one, you may well have two (or more -
up to any infinite number) of subsequent boots that use the same
"random" data for initialization.

And doing that "boot a pre-existing image" is not as unusual as you'd
think. Some people do it to make bootup faster - there have been
people who work on pre-populating bootup all the way to user mode by
basically making boot be a "resume from disk" kind of event.

So a large part of the problem is that we don't actually trust things
that _should_ be trust-worthy, because we've seen (over and over
again) people mis-using it. So then we do mix in the data into the
randomness pool (because there's no downside to _that_), but we don't
treat it as entropy (because while it _probably_ is, we don't actually
trust it sufficiently).

A _lot_ of the problems with randomness come from these trust issues.
Our entropy counting is very very conservative indeed.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 19:56                                                                                 ` Eric W. Biederman
  2019-09-18 20:13                                                                                   ` Linus Torvalds
@ 2019-09-18 20:15                                                                                   ` Alexander E. Patrakov
  2019-09-18 20:26                                                                                     ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-18 20:15 UTC (permalink / raw)
  To: Eric W. Biederman, Lennart Poettering
  Cc: Linus Torvalds, Ahmed S. Darwish, Theodore Y. Ts'o,
	Willy Tarreau, Matthew Garrett, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, zhangjs, linux-ext4,
	lkml


[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

19.09.2019 00:56, Eric W. Biederman пишет:

> The cheap solution appears to be copying a random seed from a previous
> boot, and I think that will take care of many many cases, and has
> already been implemented.  Which reduces this to a system first
> boot issue.

No, this is not the solution, if we take seriously not only getrandom 
hangs, but also urandom warnings. In some setups (root on LUKS is one of 
them) they happen early in the initramfs. Therefore "restoring" entropy 
from the previous boot by a script that runs from the main system is too 
late. That's why it is suggested to load at least a part of the random 
seed in the boot loader, and that has not been commonly implemented.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 20:15                                                                                   ` Alexander E. Patrakov
@ 2019-09-18 20:26                                                                                     ` Linus Torvalds
  2019-09-18 22:12                                                                                       ` Willy Tarreau
  2019-09-27 13:57                                                                                       ` Lennart Poettering
  0 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-18 20:26 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Eric W. Biederman, Lennart Poettering, Ahmed S. Darwish,
	Theodore Y. Ts'o, Willy Tarreau, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 1:15 PM Alexander E. Patrakov
<patrakov@gmail.com> wrote:
>
> No, this is not the solution, if we take seriously not only getrandom
> hangs, but also urandom warnings. In some setups (root on LUKS is one of
> them) they happen early in the initramfs. Therefore "restoring" entropy
> from the previous boot by a script that runs from the main system is too
> late. That's why it is suggested to load at least a part of the random
> seed in the boot loader, and that has not been commonly implemented.

Honestly, I think the bootloader suggestion is naive and silly too.

Yes, we now support it. And no, I don't think people will trust that
either. And I suspect for good reason: there's really very little
reason to believe that bootloaders would be any better than any other
part of the system.

So right now some people trust bootloaders exactly _because_ there
basically is just one or two that do this, and the people who use them
are usually the people who wrote them or are at least closely
associated with them. That will change, and then people will say "why
would I trust that, when we know of bug Xyz".

And I guarantee that those bugs _will_ happen, and people will quite
reasonably then say "yeah, I don't trust the bootloader". Bootloaders
do some questionable things.

The most likely thing to actually be somewhat useful is I feel things
like the kernel just saving the seed by itself in nvram. There's
already an example of this for the EFI random seed thing, but that's
used purely for kexec, I think.

Adding an EFI variable (or other platform nonvolatile thing), and
reading (and writing to it) purely from the kernel ends up being one
of those things where you can then say "ok, if we trust the platform
AT ALL, we can trust that". Since you can't reasonably do things like
add EFI variables to your distro image by mistake.

Of course, even then people will say "I don't trust the platform". But
at some point you just say "you have trust issues" and move on.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH RFC v4 0/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
  2019-09-15 18:32                               ` Willy Tarreau
  2019-09-16 18:08                               ` Lennart Poettering
@ 2019-09-18 21:15                               ` Ahmed S. Darwish
  2019-09-18 21:17                                 ` [PATCH RFC v4 1/1] " Ahmed S. Darwish
  2 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-18 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

Hi,

This is an RFC, and it obviously needs much more testing beside the
"it boots" smoke test I've just did.

Interestingly though, on my current system, the triggered WARN()
**reliably** makes the system get un-stuck... I know this is a very
crude heuristic, but I would personally prefer it to the other
proposals that were mentioned in this jumbo thread.

If I get an OK from Linus on this, I'll send a polished v5: further
real testing, kernel-parameters.txt docs, a new getrandom_wait(7)
manpage as referenced in the WARN() message, and extensions to the
getrandom(2) manpage for new getrandom2().

The new getrandom2() system call is basically a summary of Linus',
Lennart's, and Willy's proposals. Please see the patch #1 commit log,
and the "Link:" section inside it, for a rationale.

@Lennart, since you obviously represent user-space here, any further
notes on the new system call?

thanks,

Ahmed S. Darwish (1):
  random: WARN on large getrandom() waits and introduce getrandom2()

 drivers/char/Kconfig        | 60 ++++++++++++++++++++++++--
 drivers/char/random.c       | 85 ++++++++++++++++++++++++++++++++-----
 include/uapi/linux/random.h | 20 +++++++--
 3 files changed, 148 insertions(+), 17 deletions(-)

--
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-18 21:15                               ` [PATCH RFC v4 0/1] random: WARN on large getrandom() waits and introduce getrandom2() Ahmed S. Darwish
@ 2019-09-18 21:17                                 ` Ahmed S. Darwish
  2019-09-18 23:57                                   ` Linus Torvalds
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-18 21:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

Since Linux v3.17, getrandom(2) has been created as a new and more
secure interface for pseudorandom data requests.  It attempted to
solve three problems, as compared to /dev/urandom:

  1. the need to access filesystem paths, which can fail, e.g. under a
     chroot

  2. the need to open a file descriptor, which can fail under file
     descriptor exhaustion attacks

  3. the possibility of getting not-so-random data from /dev/urandom,
     due to an incompletely initialized kernel entropy pool

To solve the third point, getrandom(2) was made to block until a
proper amount of entropy has been accumulated to initialize the
CHACHA20 cipher.  This basically made the system call have no
guaranteed upper-bound for its initial waiting time.

Thus when it was introduced at c6e9d6f38894 ("random: introduce
getrandom(2) system call"), it came with a clear warning: "Any
userspace program which uses this new functionality must take care to
assure that if it is used during the boot process, that it will not
cause the init scripts or other portions of the system startup to hang
indefinitely."

Unfortunately, due to multiple factors, including not having this
warning written in a scary-enough language in the manpages, and due to
glibc since v2.25 implementing a BSD-like getentropy(3) in terms of
getrandom(2), modern user-space is calling getrandom(2) in the boot
path everywhere.

Embedded Linux systems were first hit by this, and reports of embedded
systems "getting stuck at boot" began to be common.  Over time, the
issue began to even creep into consumer-level x86 laptops: mainstream
distributions, like Debian Buster, began to recommend installing
haveged as a duct-tape workaround... just to let the system boot. (!)

Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33
("ext4: make __ext4_get_inode_loc plug"), which merged directory
lookup code inode table IO, and very fast systemd boots, further
exaggerated the problem by limiting interrupt-based entropy sources.
This led to large delays until the kernel's cryptographic random
number generator (CRNG) got initialized.

Mitigate the problem, as a first step, in two ways:

  1. Issue a big WARN_ON when any process gets stuck on getrandom(2)
     for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds.

  2. Introduce the new getrandom2(2) system call, with clear semantics
     that can guide user-space in doing the right thing.

On the author's Thinkpad E480 x86 laptop and an ArchLinux user-space,
the ext4 commit earlier mentioned reliably blocked the system on GDM
gnome-session boot. Complain loudly through a WARN_ON if processes
get stuck on getrandom(2). Beside its obvious informational purposes,
the WARN_ON also reliably gets the system unstuck.

Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second
default value. We __deeply encourage__ system integrators and
distribution builders not to increase it much: during system boot, you
either have entropy, or you don't. And if you didn't have entropy, it
will stay like this forever, because if you had, you wouldn't have
blocked in the first place. It's an atomic "either/or" situation, with
no middle ground. Please think twice.

For the new getrandom2(2) system call, it tries to avoid the problems
introduced by its earlier siblings. As Linus mentioned several times
in the bug report thread, Linux should have never provided the
"/dev/random" and "getrandom(GRND_RANDOM)" APIs. These interfaces are
broken by design due to their almost-permanent blockage, leading to
the current misuse of /dev/urandom and getrandom(flags=0) calls. Thus
for getrandom2, introduce the flags:

  1. GRND2_SECURE_UNBOUNDED_INITIAL_WAIT
  2. GRND2_INSECURE

where both extract randomness __exclusively__ from the urandom source.
Due to the clear nature of its new GRND2_* flags, the getrandom2()
system call will never issue any warnings on the kernel log.

OpenBSD, to its credit, got that correctly from the start by making
both of /dev/random and /dev/urandom equivalent.

Rreported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Link: https://lkml.kernel.org/r/20190910042107.GA1517@darwi-home-pc
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190914222432.GC19710@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190917052438.GA26923@1wt.eu
Link: https://lkml.kernel.org/r/20190917160844.GC31567@gardel-login
Link: https://lkml.kernel.org/r/CAHk-=wjABG3+daJFr4w3a+OWuraVcZpi=SMUg=pnZ+7+O0E2FA@mail.gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wjQeiYu8Q_wcMgM-nAcW7KsBfG1+90DaTD5WF2cCeGCgA@mail.gmail.com
Link: https://factorable.net ("Widespread Weak Keys in Network Devices")
Link: https://man.openbsd.org/man4/random.4
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---
 drivers/char/Kconfig        | 60 ++++++++++++++++++++++++--
 drivers/char/random.c       | 85 ++++++++++++++++++++++++++++++++-----
 include/uapi/linux/random.h | 20 +++++++--
 3 files changed, 148 insertions(+), 17 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index df0fc997dc3e..772765c36fc3 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -535,8 +535,6 @@ config ADI
 	  and SSM (Silicon Secured Memory).  Intended consumers of this
 	  driver include crash and makedumpfile.
 
-endmenu
-
 config RANDOM_TRUST_CPU
 	bool "Trust the CPU manufacturer to initialize Linux's CRNG"
 	depends on X86 || S390 || PPC
@@ -559,4 +557,60 @@ config RANDOM_TRUST_BOOTLOADER
 	device randomness. Say Y here to assume the entropy provided by the
 	booloader is trustworthy so it will be added to the kernel's entropy
 	pool. Otherwise, say N here so it will be regarded as device input that
-	only mixes the entropy pool.
\ No newline at end of file
+	only mixes the entropy pool.
+
+config GETRANDOM_WAIT_THRESHOLD_SEC
+	int
+	default 30
+	help
+	  The getrandom(2) system call, when asking for entropy from the
+	  urandom source, blocks until the kernel's Cryptographic Random
+	  Number Generator (CRNG) gets initialized. This configuration
+	  option sets the maximum wait time, in seconds, for a process
+	  to get blocked on such a system call before the kernel issues
+	  a loud warning. Rationale follows:
+
+	  When the getrandom(2) system call was created, it came with
+	  the clear warning: "Any userspace program which uses this new
+	  functionality must take care to assure that if it is used
+	  during the boot process, that it will not cause the init
+	  scripts or other portions of the system startup to hang
+	  indefinitely.
+
+	  Unfortunately, due to multiple factors, including not having
+	  this warning written in a scary-enough language in the
+	  manpages, and due to glibc since v2.25 implementing a BSD-like
+	  getentropy(3) in terms of getrandom(2), modern user-space is
+	  calling getrandom(2) in the boot path everywhere.
+
+	  Embedded Linux systems were first hit by this, and reports of
+	  embedded system "getting stuck at boot" began to be
+	  common. Over time, the issue began to even creep into consumer
+	  level x86 laptops: mainstream distributions, like Debian
+	  Buster, began to recommend installing haveged as a workaround,
+	  just to let the system boot.
+
+	  Filesystem optimizations in EXT4 and XFS exagerated the
+	  problem, due to aggressive batching of IO requests, and thus
+	  minimizing sources of entropy at boot. This led to large
+	  delays until the kernel's CRNG got initialized.
+
+	  System integrators and distribution builderss are not
+	  encouraged to considerably increase this value: during system
+	  boot, you either have entropy, or you don't. And if you didn't
+	  have entropy, it will stay like this forever, because if you
+	  had, you wouldn't have blocked in the first place. It's an
+	  atomic "either/or" situation, with no middle ground. Please
+	  think twice.
+
+	  Ideally, systems would be configured with hardware random
+	  number generators, and/or configured to trust the CPU-provided
+	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
+	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
+	  should generate cryptographic keys only as late as possible,
+	  when they are needed, instead of during early boot.  For
+	  non-cryptographic use cases, such as dictionary seeds or MIT
+	  Magic Cookies, the getrandom2(GRND2_INSECURE) system call,
+	  or even random(3), may be more appropropriate.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 566922df4b7b..74057e496303 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -322,6 +322,7 @@
 #include <linux/interrupt.h>
 #include <linux/mm.h>
 #include <linux/nodemask.h>
+#include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/kthread.h>
 #include <linux/percpu.h>
@@ -854,12 +855,21 @@ static void invalidate_batched_entropy(void);
 static void numa_crng_init(void);
 
 static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static int getrandom_wait_threshold __ro_after_init =
+				CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC;
+
 static int __init parse_trust_cpu(char *arg)
 {
 	return kstrtobool(arg, &trust_cpu);
 }
 early_param("random.trust_cpu", parse_trust_cpu);
 
+static int __init parse_getrandom_wait_threshold(char *arg)
+{
+	return kstrtoint(arg, 0, &getrandom_wait_threshold);
+}
+early_param("random.getrandom_wait_threshold", parse_getrandom_wait_threshold);
+
 static void crng_initialize(struct crng_state *crng)
 {
 	int		i;
@@ -1960,7 +1970,7 @@ random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 }
 
 static ssize_t
-urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+_urandom_read(char __user *buf, size_t nbytes, bool warn_on_noninited_crng)
 {
 	unsigned long flags;
 	static int maxwarn = 10;
@@ -1968,7 +1978,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 
 	if (!crng_ready() && maxwarn > 0) {
 		maxwarn--;
-		if (__ratelimit(&urandom_warning))
+		if (warn_on_noninited_crng && __ratelimit(&urandom_warning))
 			printk(KERN_NOTICE "random: %s: uninitialized "
 			       "urandom read (%zd bytes read)\n",
 			       current->comm, nbytes);
@@ -1982,6 +1992,12 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
+static ssize_t
+urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+{
+	return _urandom_read(buf, nbytes, true);
+}
+
 static __poll_t
 random_poll(struct file *file, poll_table * wait)
 {
@@ -2118,11 +2134,41 @@ const struct file_operations urandom_fops = {
 	.llseek = noop_llseek,
 };
 
-SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
-		unsigned int, flags)
+static int getrandom_wait(char __user *buf, size_t count,
+			  bool warn_on_large_wait)
 {
+	unsigned long timeout = MAX_SCHEDULE_TIMEOUT;
 	int ret;
 
+	if (warn_on_large_wait && (getrandom_wait_threshold > 0))
+		timeout = HZ * getrandom_wait_threshold;
+
+	do {
+		ret = wait_event_interruptible_timeout(crng_init_wait,
+						       crng_ready(),
+						       timeout);
+		if (ret < 0)
+			return ret;
+
+		if (ret == 0) {
+			WARN(1, "random: %s[%d]: getrandom(%zu bytes) "
+			     "is blocked for more than %d seconds. Check "
+			     "getrandom_wait(7)\n", current->comm,
+			     task_pid_nr(current), count,
+			     getrandom_wait_threshold);
+
+			/* warn once per caller */
+			timeout = MAX_SCHEDULE_TIMEOUT;
+		}
+
+	} while (ret == 0);
+
+	return _urandom_read(buf, count, true);
+}
+
+SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
+		unsigned int, flags)
+{
 	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
 		return -EINVAL;
 
@@ -2132,14 +2178,31 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (flags & GRND_RANDOM)
 		return _random_read(flags & GRND_NONBLOCK, buf, count);
 
-	if (!crng_ready()) {
-		if (flags & GRND_NONBLOCK)
+	if ((flags & GRND_NONBLOCK) && !crng_ready())
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
-	}
-	return urandom_read(NULL, buf, count, NULL);
+
+	return getrandom_wait(buf, count, true);
+}
+
+SYSCALL_DEFINE3(getrandom2, char __user *, buf, size_t, count,
+		unsigned int, flags)
+{
+	if (flags & ~(GRND2_SECURE_UNBOUNDED_INITIAL_WAIT|GRND2_INSECURE))
+		return -EINVAL;
+
+	if (flags & (GRND2_SECURE_UNBOUNDED_INITIAL_WAIT|GRND2_INSECURE))
+		return -EINVAL;
+
+	if (count > INT_MAX)
+		count = INT_MAX;
+
+	if (flags & GRND2_SECURE_UNBOUNDED_INITIAL_WAIT)
+		return getrandom_wait(buf, count, false);
+
+	if (flags & GRND2_INSECURE)
+		return _urandom_read(buf, count, false);
+
+	unreachable();
 }
 
 /********************************************************************
diff --git a/include/uapi/linux/random.h b/include/uapi/linux/random.h
index 26ee91300e3e..3f09a8f6aff3 100644
--- a/include/uapi/linux/random.h
+++ b/include/uapi/linux/random.h
@@ -8,6 +8,7 @@
 #ifndef _UAPI_LINUX_RANDOM_H
 #define _UAPI_LINUX_RANDOM_H
 
+#include <linux/bits.h>
 #include <linux/types.h>
 #include <linux/ioctl.h>
 #include <linux/irqnr.h>
@@ -23,7 +24,7 @@
 /* Get the contents of the entropy pool.  (Superuser only.) */
 #define RNDGETPOOL	_IOR( 'R', 0x02, int [2] )
 
-/* 
+/*
  * Write bytes into the entropy pool and add to the entropy count.
  * (Superuser only.)
  */
@@ -50,7 +51,20 @@ struct rand_pool_info {
  * GRND_NONBLOCK	Don't block and return EAGAIN instead
  * GRND_RANDOM		Use the /dev/random pool instead of /dev/urandom
  */
-#define GRND_NONBLOCK	0x0001
-#define GRND_RANDOM	0x0002
+#define GRND_NONBLOCK				BIT(0)
+#define GRND_RANDOM				BIT(1)
+
+/*
+ * Flags for getrandom2(2)
+ *
+ * GRND2_SECURE		Use urandom pool, block until CRNG is inited
+ * GRND2_INSECURE	Use urandom pool, never block even if CRNG isn't inited
+ *
+ * NOTE: don't mix flag values with GRND, to protect against the
+ * security implications of users passing the invalid flag family
+ * to system calls (GRND_* vs. GRND2_*).
+ */
+#define GRND2_SECURE_UNBOUNDED_INITIAL_WAIT	BIT(7)
+#define GRND2_INSECURE				BIT(8)
 
 #endif /* _UAPI_LINUX_RANDOM_H */
-- 
Ahmed Darwish
http://darwish.chasingpointers.com

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 20:26                                                                                     ` Linus Torvalds
@ 2019-09-18 22:12                                                                                       ` Willy Tarreau
  2019-09-27 13:57                                                                                       ` Lennart Poettering
  1 sibling, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-18 22:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander E. Patrakov, Eric W. Biederman, Lennart Poettering,
	Ahmed S. Darwish, Theodore Y. Ts'o, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Wed, Sep 18, 2019 at 01:26:39PM -0700, Linus Torvalds wrote:
> Of course, even then people will say "I don't trust the platform". But
> at some point you just say "you have trust issues" and move on.

It's where our extreme configurability can hurt. Sometimes we'd rather
avoid providing some of these "I don't trust this or that" options and
impose some choices to users: "you need entropy to boot, stop being
childish and collect the small entropy where it is, period". I'm not
certain the other operating systems not experiencing entropy issues
leave as many choices as we do. I can understand how some choices may
be problematic in virtual environments but there are so many other
attack vectors there that randomness is probably a detail.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-18 21:17                                 ` [PATCH RFC v4 1/1] " Ahmed S. Darwish
@ 2019-09-18 23:57                                   ` Linus Torvalds
  2019-09-19 14:34                                     ` Theodore Y. Ts'o
                                                       ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-18 23:57 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man


[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]

On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> Since Linux v3.17, getrandom(2) has been created as a new and more
> secure interface for pseudorandom data requests.  It attempted to
> solve three problems, as compared to /dev/urandom:

I don't think your patch is really _wrong_, but I think it's silly to
introduce a new system call, when we have 30 bits left in the flags of
the old one, and the old system call checked them.

So it's much simpler and more straightforward to  just introduce a
single new bit #2 that says "I actually know what I'm doing, and I'm
explicitly asking for secure/insecure random data".

And then say that the existing bit #1 just means "I want to wait for entropy".

So then you end up with this:

    /*
     * Flags for getrandom(2)
     *
     * GRND_NONBLOCK    Don't block and return EAGAIN instead
     * GRND_WAIT_ENTROPY        Explicitly wait for entropy
     * GRND_EXPLICIT    Make it clear you know what you are doing
     */
    #define GRND_NONBLOCK               0x0001
    #define GRND_WAIT_ENTROPY   0x0002
    #define GRND_EXPLICIT               0x0004

    #define GRND_SECURE (GRND_EXPLICIT | GRND_WAIT_ENTROPY)
    #define GRND_INSECURE       (GRND_EXPLICIT | GRND_NONBLOCK)

    /* Nobody wants /dev/random behavior, nobody should use it */
    #define GRND_RANDOM 0x0002

which is actually fairly easy to understand. So now we have three
bits, and the values are:

 000  - ambiguous "secure or just lazy/ignorant"
 001 - -EAGAIN or secure
 010 - blocking /dev/random DO NOT USE
 011 - nonblocking /dev/random DO NOT USE
 100 - nonsense, returns -EINVAL
 101 - /dev/urandom without warnings
 110 - blocking secure
 111 - -EAGAIN or secure

and people would be encouraged to use one of these three:

 - GRND_INSECURE
 - GRND_SECURE
 - GRND_SECURE | GRND_NONBLOCK

all of which actually make sense, and none of which have any
ambiguity. And while "GRND_INSECURE | GRND_NONBLOCK" works, it's
exactly the same as just plain GRND_INSECURE - the point is that it
doesn't block for entropy anyway, so non-blocking makes no different.

NOTE! This patch looks bigger than it really is. I've changed the
if-statement in getrandom() to a switch-statement, and I did this:

-       if (count > INT_MAX)
-               count = INT_MAX;
+       count = min_t(size_t, count, INT_MAX >> (ENTROPY_SHIFT + 3));

to match what "urandom_read()" already did. That changes the semantics
a bit, but only for the /dev/random case, and only for insanity (the
limit we truncate to is now 32MB read, rather than 2GB - and we
already had that limit for urandom).

There is *one* other small semantic change: The old code did
urandom_read() which added warnings, but each warning also _reset_ the
crng_init_cnt. Until it decided not to warn any more, at which point
it also stops that resetting of crng_init_cnt.

And that reset of crng_init_cnt, btw, is some cray cray.

It's basically a "we used up entropy" thing, which is very
questionable to begin with as the whole discussion has shown, but
since it stops doing it after 10 cases, it's not even good security
assuming the "use up entropy" case makes sense in the first place.

So I didn't copy that insanity either. And I'm wondering if removing
it from /dev/urandom might also end up helping Ahmed's case of getting
entropy earlier, when we don't reset the counter.

But other than those two details, none of the existing semantics
changed, we just added the three actually _sane_ cases without any
ambiguity.

In particular, this still leaves the semantics of that nasty
"getrandom(0)" as the same "blocking urandom" that it currently is.
But now it's a separate case, and we can make that perhaps do the
timeout, or at least the warning.

And the new cases are defined to *not* warn. In particular,
GRND_INSECURE very much does *not* warn about early urandom access
when crng isn't ready. Because the whole point of that new mode is
that the user knows it isn't secure.

So that should make getrandom(GRND_INSECURE) palatable to the systemd
kind of use that wanted to avoid the pointless kernel warning.

And we could mark this for stable and try to get it backported so that
it will have better coverage, and encourage people to use the new sane
_explicit_ waiting (or not) for entropy.

Comments? Full patch as attachment.

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2727 bytes --]

 drivers/char/random.c       | 50 +++++++++++++++++++++++++++++++++++++--------
 include/uapi/linux/random.h | 12 +++++++++--
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5d5ea4ce1442..c14fa4780066 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2123,23 +2123,57 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 {
 	int ret;
 
-	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
+	if (flags & ~(GRND_NONBLOCK|GRND_WAIT_ENTROPY|GRND_EXPLICIT))
 		return -EINVAL;
 
-	if (count > INT_MAX)
-		count = INT_MAX;
+	count = min_t(size_t, count, INT_MAX >> (ENTROPY_SHIFT + 3));
 
-	if (flags & GRND_RANDOM)
+	switch (flags) {
+	case GRND_SECURE:
+		ret = wait_for_random_bytes();
+		if (ret)
+			return ret;
+		break;
+
+	case GRND_SECURE | GRND_NONBLOCK:
+		if (!crng_ready())
+			return -EAGAIN;
+		break;
+
+	case GRND_INSECURE:
+		break;
+
+	default:
+		return -EINVAL;
+
+	/* BAD. Legacy flags. */
+	case GRND_RANDOM | GRND_NONBLOCK:
+	case GRND_RANDOM:
 		return _random_read(flags & GRND_NONBLOCK, buf, count);
 
-	if (!crng_ready()) {
-		if (flags & GRND_NONBLOCK)
+	case GRND_NONBLOCK:
+		if (!crng_ready())
 			return -EAGAIN;
+		break;
+
+	/*
+	 * People are really confused about whether
+	 * this is secure or insecure. Traditional
+	 * behavior is secure, but there are users
+	 * who clearly didn't want that, and just
+	 * never thought about it.
+	 */
+	case 0:
 		ret = wait_for_random_bytes();
-		if (unlikely(ret))
+		if (ret)
 			return ret;
+		break;
 	}
-	return urandom_read(NULL, buf, count, NULL);
+
+	/* equivalent to urandom_read() without the crazy */
+	ret = extract_crng_user(buf, count);
+	trace_urandom_read(8 * count, 0, ENTROPY_BITS(&input_pool));
+	return ret;
 }
 
 /********************************************************************
diff --git a/include/uapi/linux/random.h b/include/uapi/linux/random.h
index 26ee91300e3e..f933f2a843c0 100644
--- a/include/uapi/linux/random.h
+++ b/include/uapi/linux/random.h
@@ -48,9 +48,17 @@ struct rand_pool_info {
  * Flags for getrandom(2)
  *
  * GRND_NONBLOCK	Don't block and return EAGAIN instead
- * GRND_RANDOM		Use the /dev/random pool instead of /dev/urandom
+ * GRND_WAIT_ENTROPY	Explicitly wait for entropy
+ * GRND_EXPLICIT	Make it clear you know what you are doing
  */
-#define GRND_NONBLOCK	0x0001
+#define GRND_NONBLOCK		0x0001
+#define GRND_WAIT_ENTROPY	0x0002
+#define GRND_EXPLICIT		0x0004
+
+#define GRND_SECURE	(GRND_EXPLICIT | GRND_WAIT_ENTROPY)
+#define GRND_INSECURE	(GRND_EXPLICIT | GRND_NONBLOCK)
+
+/* Nobody wants /dev/random behavior, nobody should use it */
 #define GRND_RANDOM	0x0002
 
 #endif /* _UAPI_LINUX_RANDOM_H */

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 13:53                                                                                             ` Lennart Poettering
@ 2019-09-19  7:28                                                                                               ` Martin Steigerwald
  0 siblings, 0 replies; 211+ messages in thread
From: Martin Steigerwald @ 2019-09-19  7:28 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Matthew Garrett, Ahmed S. Darwish, Linus Torvalds,
	Theodore Y. Ts'o, Willy Tarreau, Vito Caputo, Andreas Dilger,
	Jan Kara, Ray Strode, William Jon McCann, Alexander E. Patrakov,
	zhangjs, linux-ext4, lkml

Dear Lennart.

Lennart Poettering - 18.09.19, 15:53:25 CEST:
> On Mi, 18.09.19 00:10, Martin Steigerwald (martin@lichtvoll.de) wrote:
> > > getrandom() will never "consume entropy" in a way that will block
> > > any
> > > users of getrandom(). If you don't have enough collected entropy
> > > to
> > > seed the rng, getrandom() will block. If you do, getrandom() will
> > > generate as many numbers as you ask it to, even if no more entropy
> > > is
> > > ever collected by the system. So it doesn't matter how many
> > > clients
> > > you have calling getrandom() in the boot process - either there'll
> > > be
> > > enough entropy available to satisfy all of them, or there'll be
> > > too
> > > little to satisfy any of them.
> > 
> > Right, but then Systemd would not use getrandom() for initial
> > hashmap/ UUID stuff since it
> 
> Actually things are more complex. In systemd there are four classes of
> random values we need:
> 
> 1. High "cryptographic" quality. There are very few needs for this in
[…]
> 2. High "non-cryptographic" quality. This is used for example for
[…]
> 3. Medium quality. This is used for seeding hash tables. These may be
[…]
> 4. Crap quality. There are only a few uses of this, where rand_r() is
>    is OK.
> 
> Of these four case, the first two might block boot. Because the first
> case is not common you won't see blocking that often though for
> them. The second case is very common, but since we use RDRAND you
> won't see it on any recent Intel machines.
> 
> Or to say this all differently: the hash table seeding and the uuid
> case are two distinct cases in systemd, and I am sure they should be.

Thank you very much for your summary of uses of random numbers in 
Systemd and also for your other mail that "neither RDRAND nor /dev/
urandom know a concept of of "depleting entropy"". I thought they would 
deplete entropy needed to the initial seeding of crng.

Thank you also for taking part in this discussion, even if someone put 
your mail address on carbon copy without asking with.

I do not claim I understand enough of this random number stuff. But I 
feel its important that kernel and userspace developers actually talk 
with each other about a sane approach for it. And I believe that the 
complexity involved is part of the issue. I feel an API for attaining 
random number with different quality levels needs to be much, much, much 
more simple to use *properly*.

I felt a bit overwhelmed by the discussion (and by what else is 
happening in my life, just having come back from holding a Linux 
performance workshop in front of about two dozen people), so I intend to 
step back from it. 

If one of my mails actually helped to encourage or facilitate kernel 
space and user space developers talking with each other about a sane 
approach to random numbers, then I may have used my soft skills in a way 
that brings some benefit. For the technical aspects certainly people are 
taking part in this discussion who are much much deeper into the 
intricacies of entropy in Linux and computers in general, so I just hope 
for a good outcome.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-18 23:57                                   ` Linus Torvalds
@ 2019-09-19 14:34                                     ` Theodore Y. Ts'o
  2019-09-19 15:20                                       ` Linus Torvalds
  2019-09-20 13:46                                     ` Ahmed S. Darwish
  2019-09-26 20:42                                     ` [PATCH v5 0/1] random: getrandom(2): warn on large CRNG waits, introduce new flags Ahmed S. Darwish
  2 siblings, 1 reply; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-19 14:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

(Adding linux-api since this patch proposes an API change; both by
changing the existing behavior, and adding new flags and possibly a
new system call.)

On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> >
> > Since Linux v3.17, getrandom(2) has been created as a new and more
> > secure interface for pseudorandom data requests.  It attempted to
> > solve three problems, as compared to /dev/urandom:
> 
> I don't think your patch is really _wrong_, but I think it's silly to
> introduce a new system call, when we have 30 bits left in the flags of
> the old one, and the old system call checked them.

The only reason to introduce a new system call is if we were going to
keep the existing behavior of getrandom.  Given that the patch changes
what getrandom(0), I agree there's no point to adding a new system
call.

> There is *one* other small semantic change: The old code did
> urandom_read() which added warnings, but each warning also _reset_ the
> crng_init_cnt. Until it decided not to warn any more, at which point
> it also stops that resetting of crng_init_cnt.
> 
> And that reset of crng_init_cnt, btw, is some cray cray.
> 
> It's basically a "we used up entropy" thing, which is very
> questionable to begin with as the whole discussion has shown, but
> since it stops doing it after 10 cases, it's not even good security
> assuming the "use up entropy" case makes sense in the first place.

It was a bug that it stopped doing it after 10 tries, and there's a
really good reason for it.  Yes, the "using up entropy" thing doesn't
make much sense in the general case.  But we still need some threshold
for deciding whether or not it's been sufficiently initialized such
that we consider the CRNG initialized.

The reason for zeroing it after we expose state is because otherwise
if the pool starts in a known state (the attacker knows the starting
configuration, knows the DMI table that we're mixing into the pool
since that's a constant, etc.), then after we've injected a small
amount of uncertainty in the pool --- say, we started with a single
known state of the pool, and after injecting some randomness, there
are 64 possible states of the pool.  If the attacker can read from
/dev/urandom, the attacker can know which of the 64 possible states of
the pool it's in.  Now suppose we inject more uncertainty, so that
there's another 64 unknown states, and the attacker is able to
constantly read from /dev/urandom in a tight loop; it'll be able to
keep up with the injection of entropy insertion, and so even though
we've injected 256 "bits" of uncertainty, the attacker will still know
the state of the pool.  That's why when we read from the pool, we need
to clear the entropy bits.

This is sometimes called a "state extension attack", and there have
been attacks that have been carried out against RNG's that's don't
protect against it.  What happened is when I added the rate-limiting
to the uninitialized /dev/urandom warning, I accidentally wiped out
the protection.  But it was there for a reason.

> And the new cases are defined to *not* warn. In particular,
> GRND_INSECURE very much does *not* warn about early urandom access
> when crng isn't ready. Because the whole point of that new mode is
> that the user knows it isn't secure.
> 
> So that should make getrandom(GRND_INSECURE) palatable to the systemd
> kind of use that wanted to avoid the pointless kernel warning.

Yes, that's clearly the right thing to do.  I do think we need to
restore the state extension attack protections, though.

> +	/*
> +	 * People are really confused about whether
> +	 * this is secure or insecure. Traditional
> +	 * behavior is secure, but there are users
> +	 * who clearly didn't want that, and just
> +	 * never thought about it.
> +	 */
> +	case 0:
>  		ret = wait_for_random_bytes();
> -		if (unlikely(ret))
> +		if (ret)
>  			return ret;
> +		break;

I'm happy this proposed is not changing the behavior of getrandom(0).
Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though?  It
will have the same effect, and it's make it clear what we're doing.

Later on, when we rip out /dev/random pool code (and make reading from
/dev/random the equivalent of getrandom(GRND_SECURE)), we'll need to
similarly map the legacy combination of flags for GRND_RANDOM and
GRND_RANDOM | GRND_NONBLOCK.

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 14:34                                     ` Theodore Y. Ts'o
@ 2019-09-19 15:20                                       ` Linus Torvalds
  2019-09-19 15:50                                         ` Linus Torvalds
                                                           ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-19 15:20 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

On Thu, Sep 19, 2019 at 7:34 AM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> > It's basically a "we used up entropy" thing, which is very
> > questionable to begin with as the whole discussion has shown, but
> > since it stops doing it after 10 cases, it's not even good security
> > assuming the "use up entropy" case makes sense in the first place.
>
> It was a bug that it stopped doing it after 10 tries, and there's a
> really good reason for it.

I really doubt that.

> The reason for zeroing it after we expose state is because otherwise
> if the pool starts in a known state (the attacker knows the starting
> configuration, knows the DMI table that we're mixing into the pool
> since that's a constant, etc.),

That's at least partly because our pool hashing has what looks a
fairly sad property.

Yes, it hashes it using a good hash, but it does so in a way that
makes it largely possible to follow the hashing and repeat it and
analyze it.

That breaks if we have hw randomness, because it does the

        if (arch_get_random_long(&v))
                crng->state[14] ^= v;

so it always mixes in hardware randomness as part of the extraction,
but we don't mix anything else unpredictable - or even
process-specific - state in. So without hw randomness, you can try to
get a lot of data over a lot of boots - and for long times during
boots - and maybe find the pattern.

But honestly, this isn't realistic. I can point to emails where *you*
are  arguing against other hashing algorithms because the whole state
extension attack simply isn't realistic.

And I think it's also pretty questionable how we don't try to mix in
anything timing/process-specific when extracting, which is what makes
that "do lots of boots" possible.

The silly "reset crng_init_cnt" does absolutely nothing to help that,
but in fact what it does is to basically give the attacker a way to
get an infinite stream of data without any reseeding (because that
only happens after crng_read()), and able to extend that "block at
boot" time indefinitely while doing so.

Also honestly, if the attacker already has access to the system at
boot, you have some fairly big problems to begin with.

So a much bigger issue than the state extension attack (pretty much
purely theoretical, given any entropy at all, which we _will_ have
even without the crng_init_cnt clearing) is the fact that right now we
really are predictable if there are no hardware interrupts, and people
have used /dev/urandom because other sources weren't useful.

And the fact is, we *know* people use /dev/urandom exactly because
other sources haven't been useful.

And unlike your theoretical state extension attack, I can point you to
black hat presentations that literally talk about using the fact that
we delay m,ixing in the input pull hash to know what's going on:

  https://www.blackhat.com/docs/eu-14/materials/eu-14-Kedmi-Attacking-The-Linux-PRNG-On-Android-Weaknesses-In-Seeding-Of-Entropic-Pools-And-Low-Boot-Time-Entropy.pdf

That's a real attack. Based on the REAL fact that we currently have to
use the urandom logic because the entropy-waiting one is useless, and
in fact depends on the re-seeding happening too late.

Yes, yes, our urandom has changed since that attack, and we use chacha
instead of sha1 these days. We have other changes too. But I don't see
anything fundamentally different.

And all your arguments seem to make that _real_ security issue just
worse, exactly because we also avoid reseeding while crng_init is
zero.

> I'm happy this proposed is not changing the behavior of getrandom(0).
> Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though?  It
> will have the same effect, and it's make it clear what we're doing.

Have you you not followed the whole discussion? Didn't you read the comment?

People use "getrandom(0)" not because they want secure randomness, but
because that's the default.

And we *will* do something about it. This patch didn't, because I want
to be able to backport it to stable, so that everybody is happier with
saying "ok, I'll use the new getrandom(GRND_INSECURE)".

Because getrandom(0) will NOT be the the same as GRND_EXPLICIT |
GRND_WAIT_ENTROPY.

getrandom(0) is the "I don't know what I am doing" thing. It could be
somebody that wants real secure random numbers. Or it could *not* be
one of those, and need the timeout.

> Later on, when we rip out /dev/random pool code (and make reading from
> /dev/random the equivalent of getrandom(GRND_SECURE)), we'll need to
> similarly map the legacy combination of flags for GRND_RANDOM and
> GRND_RANDOM | GRND_NONBLOCK.

And that is completely immaterial, because the "I'm confused" case
isn't about GRND_RANDOM. Nobody uses that anyway, and more importantly
it's not the case that has caused bugs. That one blocks even during
normal execution, so that one - despite being completely useless -
actually has the one good thing going for it that it's testable.
People will see the "oh, that took a long time" during testing. And
then they'll stop using it.

Ted - you really don't seem to be making any distinction between
"these are real problems that should be fixed" vs "this is theory that
isn't relevant".

The "getrandom(0)" is a real problem that needs to be fixed.

The warnings from /dev/urandom are real problems that people
apparently have worked around by (incorrectly) using getrandom(0).

The "hashing the random pool still leaves identities in place" is a
real problem that had a real attack.

The state extension attack? Complete theory (again, I can point to you
saying the same thing in other threads), and the "fix" of resetting
the counter and not reseeding seems to be anything but.

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 15:20                                       ` Linus Torvalds
@ 2019-09-19 15:50                                         ` Linus Torvalds
  2019-09-20 13:13                                           ` Theodore Y. Ts'o
  2019-09-19 20:04                                         ` Linus Torvalds
  2019-09-20 13:08                                         ` Theodore Y. Ts'o
  2 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-19 15:50 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man


[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

On Thu, Sep 19, 2019 at 8:20 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The silly "reset crng_init_cnt" does absolutely nothing to help that,
> but in fact what it does is to basically give the attacker a way to
> get an infinite stream of data without any reseeding (because that
> only happens after crng_read()), and able to extend that "block at
> boot" time indefinitely while doing so.

.. btw, instead of bad workarounds for a theoretical attack, here's
something that should add actual *practical* real value: use the time
of day (whether from an RTC device, or from ntp) to add noise to the
random pool.

If you let attackers in before you've set the clock on the device,
you're doing something seriously wrong.

And while this doesn't add much "serious" entropy, it does mean that
the whole "let's look for identical state" which is a _real_ attack,
goes out the window.

In other words, this is about real security, not academic papers.

Of course, attackers can still see possible bad random values from
before the clock was set (possibly from things like TCP sequence
numbers etc, orfrom  that AT_RANDOM of a very early process, which was
part of the Android the attack). But doing things like delaying
reseeding sure isn't helping, which is what the crng_count reset does.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 867 bytes --]

 kernel/time/timekeeping.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ca69290bee2a..67e74f7f4198 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -22,6 +22,7 @@
 #include <linux/pvclock_gtod.h>
 #include <linux/compiler.h>
 #include <linux/audit.h>
+#include <linux/random.h>
 
 #include "tick-internal.h"
 #include "ntp_internal.h"
@@ -1256,6 +1257,7 @@ int do_settimeofday64(const struct timespec64 *ts)
 
 	/* signal hrtimers about time change */
 	clock_was_set();
+	add_device_randomness(ts, sizeof(*ts));
 
 	if (!ret)
 		audit_tk_injoffset(ts_delta);
@@ -1304,6 +1306,7 @@ static int timekeeping_inject_offset(const struct timespec64 *ts)
 
 	/* signal hrtimers about time change */
 	clock_was_set();
+	add_device_randomness(ts, sizeof(*ts));
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 15:20                                       ` Linus Torvalds
  2019-09-19 15:50                                         ` Linus Torvalds
@ 2019-09-19 20:04                                         ` Linus Torvalds
  2019-09-19 20:45                                           ` Alexander E. Patrakov
  2019-09-23 11:55                                           ` David Laight
  2019-09-20 13:08                                         ` Theodore Y. Ts'o
  2 siblings, 2 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-19 20:04 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man


[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]

On Thu, Sep 19, 2019 at 8:20 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yes, it hashes it using a good hash, but it does so in a way that
> makes it largely possible to follow the hashing and repeat it and
> analyze it.
>
> That breaks if we have hw randomness, because it does the
>
>         if (arch_get_random_long(&v))
>                 crng->state[14] ^= v;
>
> so it always mixes in hardware randomness as part of the extraction,
> but we don't mix anything else unpredictable - or even
> process-specific - state in.

So this is the other actual _serious_ patch I'd suggest: replace the

          if (arch_get_random_long(&v))
                  crng->state[14] ^= v;

with

          if (!arch_get_random_long(&v))
                  v = random_get_entropy();
          crng->state[14] += v;

instead. Yeah, it still doesn't help on machines that don't even have
a cycle counter, but it at least means that you don't have to have a
CPU rdrand (or equivalent) but you do have a cycle counter, now the
extraction of randomness from the pool doesn't just do the
(predictable) mutation for the backtracking, but actually means that
you have some very hard to predict timing effects.

Again, in this case a cycle counter really does add a small amount of
entropy (everybody agrees that modern CPU's are simply too complex to
be predictable at a cycle level), but that's not really the point. The
point is that now doing the extraction really fundamentally changes
the state in unpredictable ways, so that you don't have that "if I
recognize a value, I know what the next value will be" kind of attack.

Which, as mentioned, is actually not a purely theoretical concern.

Note small detail above: I changed the ^= to a +=. Addition tends to
be better (due to carry between bits) when there might be bit
commonalities.  Particularly with something like a cycle count where
two xors can mostly cancel out previous bits rather than move bits
around in the word.

With an actual random input from rdrand, the xor-vs-add is immaterial
and doesn't matter, of course, so the old code made sense in that
context.

In the attached patch I also moved the arch_get_random_long() and
random_get_entropy() to outside the crng spinlock. We're not talking
blocking operations, but it can easily be hundreds of cycles with
rdrand retries, or the random_get_entropy() reading an external clock
on some architectures.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 659 bytes --]

diff --git a/drivers/char/random.c b/drivers/char/random.c
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1057,9 +1057,10 @@ static void _extract_crng(struct crng_state *crng,
 	    (time_after(crng_global_init_time, crng->init_time) ||
 	     time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)))
 		crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);
+	if (!arch_get_random_long(&v))
+		v = random_get_entropy();
 	spin_lock_irqsave(&crng->lock, flags);
-	if (arch_get_random_long(&v))
-		crng->state[14] ^= v;
+	crng->state[14] += v;
 	chacha20_block(&crng->state[0], out);
 	if (crng->state[12] == 0)
 		crng->state[13]++;

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 20:04                                         ` Linus Torvalds
@ 2019-09-19 20:45                                           ` Alexander E. Patrakov
  2019-09-19 21:47                                             ` Linus Torvalds
  2019-09-23 11:55                                           ` David Laight
  1 sibling, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-19 20:45 UTC (permalink / raw)
  To: Linus Torvalds, Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Michael Kerrisk, lkml, linux-ext4, linux-man

[-- Attachment #1.1: Type: text/plain, Size: 3608 bytes --]

20.09.2019 01:04, Linus Torvalds пишет:

> instead. Yeah, it still doesn't help on machines that don't even have
> a cycle counter, but it at least means that you don't have to have a
> CPU rdrand (or equivalent) but you do have a cycle counter, now the
> extraction of randomness from the pool doesn't just do the
> (predictable) mutation for the backtracking, but actually means that
> you have some very hard to predict timing effects.
> 
> Again, in this case a cycle counter really does add a small amount of
> entropy (everybody agrees that modern CPU's are simply too complex to
> be predictable at a cycle level), but that's not really the point. The
> point is that now doing the extraction really fundamentally changes
> the state in unpredictable ways, so that you don't have that "if I
> recognize a value, I know what the next value will be" kind of attack.

This already resembles in-kernel haveged (except that it doesn't credit 
entropy), and Willy Tarreau said "collect the small entropy where it is, 
period" today. So, too many people touched upon the topic in one day, 
and therefore I'll bite.

We already have user-space software (haveged and modern versions of 
rngd) that extract supposed entropy from clock jitter and feed it back 
to the kernel via /dev/random (crediting it). Indeed, at present, on 
some hardware this is the only way for distributions and users to 
collect enough entropy during boot and avoid stalls - all other 
suggestions are simply non-constructive. Also, Google's Fuchsia OS does 
use and credit jitter entropy.

For the record: I do not have a justifiable opinion whether haveged/rngd 
output (known as jitter entropy) actually contains any entropy. I 
understand that there are two possible viewpoints here. The rest of the 
email is written under the assumption that haveged does provide real 
entropy and not fake one.

The problem that I have with the current situation is that distributions 
and users, when they set up their systems to run haveged or rngd, often 
do it incorrectly (even, as mentioned, under the assumption that haveged 
is something valid and useful). The most common mistake is relying on 
systemd-provided default dependencies, thus not starting such software 
as early as possible. Even worse, no initramfs generator allows one to 
easily include haveged/rngd in the initramfs and run it there. And for 
me, the first urandom warning comes from the initramfs, so anything 
started from the main system is, arguably, already too late.

Therefore, I think, an in-kernel hwrng that exposes jitter entropy is 
something useful (for those who agree that jitter entropy is not fake), 
because it avoids the pitfall-ridden userspace setup. Just as an 
exercise, I have implemented a very simple driver (attached as a patch) 
that does just that. I am only half-serious here, the driver is only 
lightly tested in KVM without any devices except an unconnected virtio 
network card, not on any real hardware. Someone else can also find it 
useful as a test/fake hwrng driver.

I am aware that there was an earlier decision that jitter entropy should 
not be credited, i.e. effectively a pre-existing NAK from Theodore Ts'o. 
But, well, distributions are already overriding this decision in 
userspace, and do it badly, so in my viewpoint, the driver would be a 
net win if some mechanism is added that makes it a no-op by default even 
if the driver is built-in. E.g. an explicit "enable" parameter, but I am 
open to other suggestions, too.

-- 
Alexander E. Patrakov

[-- Attachment #1.2: 0001-hw_random-Add-jitterentropy_hwrng.patch --]
[-- Type: text/x-patch, Size: 4507 bytes --]

From 2836990aff5bc1dab6a4e927304247dae469c774 Mon Sep 17 00:00:00 2001
From: "Alexander E. Patrakov" <patrakov@gmail.com>
Date: Thu, 19 Sep 2019 01:18:39 +0500
Subject: [PATCH] hw_random: Add jitterentropy_hwrng

This re-exports the existing "jitterentropy_rng" cryptoapi RNG as a
hwrng. The use case is to replace haveged, which distributions
often misconfigure by running it too late, while it is really needed
even in the initramfs on some systems.

Signed-off-by: Alexander E. Patrakov <patrakov@gmail.com>
---
 drivers/char/hw_random/Kconfig               | 20 ++++++
 drivers/char/hw_random/Makefile              |  1 +
 drivers/char/hw_random/jitterentropy-hwrng.c | 70 ++++++++++++++++++++
 3 files changed, 91 insertions(+)
 create mode 100644 drivers/char/hw_random/jitterentropy-hwrng.c

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index 59f25286befe..ff2102c0159c 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -35,6 +35,26 @@ config HW_RANDOM_TIMERIOMEM
 
 	  If unsure, say Y.
 
+config HW_RANDOM_JITTERENTROPY
+        tristate "Jitter Entropy HW Random Number Generator support"
+        select CRYPTO_JITTERENTROPY
+        ---help---
+          This driver provides kernel-side support for extracting entropy
+          from CPU and memory clock jitter.
+
+          jitterentropy-hwrng serves the same purpose as haveged, but is in
+          the kernel. So, if you otherwise would have to run haveged, build
+          this driver instead, it has an advantage of being available very
+          early in the boot process.
+
+          Note that it is still not known whether clock jitter provides any
+          actual entropy.
+
+          To compile this driver as a module, choose M here: the
+          module will be called jitterentropy-hwrng.
+
+          If unsure, say N.
+
 config HW_RANDOM_INTEL
 	tristate "Intel HW Random Number Generator support"
 	depends on (X86 || IA64) && PCI
diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
index 7c9ef4a7667f..9c6d1d3626f6 100644
--- a/drivers/char/hw_random/Makefile
+++ b/drivers/char/hw_random/Makefile
@@ -6,6 +6,7 @@
 obj-$(CONFIG_HW_RANDOM) += rng-core.o
 rng-core-y := core.o
 obj-$(CONFIG_HW_RANDOM_TIMERIOMEM) += timeriomem-rng.o
+obj-$(CONFIG_HW_RANDOM_JITTERENTROPY) += jitterentropy-hwrng.o
 obj-$(CONFIG_HW_RANDOM_INTEL) += intel-rng.o
 obj-$(CONFIG_HW_RANDOM_AMD) += amd-rng.o
 obj-$(CONFIG_HW_RANDOM_ATMEL) += atmel-rng.o
diff --git a/drivers/char/hw_random/jitterentropy-hwrng.c b/drivers/char/hw_random/jitterentropy-hwrng.c
new file mode 100644
index 000000000000..b7aeefe4f47d
--- /dev/null
+++ b/drivers/char/hw_random/jitterentropy-hwrng.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2019 Alexander E. Patrakov <patrakov@gmail.com>
+ *
+ * Driver that exposes CPU clock jitter as a hardware random number generator
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/delay.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/hw_random.h>
+#include <crypto/rng.h>
+
+static struct crypto_rng *drng;
+
+static int jitterentropy_rng_read(struct hwrng *rng, void *data, size_t max, bool wait)
+{
+	int err;
+
+	/* Prevent the hwrng_fill thread from impeding progress of everything else */
+	if (wait)
+		schedule();
+
+	err = crypto_rng_get_bytes(drng, data, max);
+	if (err)
+		return err;
+	return max;
+}
+
+static struct hwrng jitterentropy_rng = {
+	.name		= KBUILD_MODNAME,
+	.read		= jitterentropy_rng_read,
+	.quality	= 4, /* minimum that guarantees progress in hwrng_fill thread */
+};
+
+static int __init mod_init(void)
+{
+	int ret;
+
+	pr_info("Registering the driver\n");
+	drng = crypto_alloc_rng("jitterentropy_rng", 0, 0);
+	if (!drng) {
+		pr_err("crypto_alloc_rng() failed\n");
+		return -ENODEV;
+	}
+
+	ret = hwrng_register(&jitterentropy_rng);
+	if (ret) {
+		crypto_free_rng(drng);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit mod_exit(void)
+{
+	hwrng_unregister(&jitterentropy_rng);
+	crypto_free_rng(drng);
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Alexander E. Patrakov <patrakov@gmail.com>");
+MODULE_DESCRIPTION("Exposes clock jitter as a hwrng");
+MODULE_SOFTDEP("pre: jitterentropy_rng");
-- 
2.23.0


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 20:45                                           ` Alexander E. Patrakov
@ 2019-09-19 21:47                                             ` Linus Torvalds
  2019-09-19 22:23                                               ` Alexander E. Patrakov
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Theodore Y. Ts'o, Ahmed S. Darwish, Lennart Poettering,
	Eric W. Biederman, Michael Kerrisk, lkml, linux-ext4, linux-man

On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
<patrakov@gmail.com> wrote:
>
> This already resembles in-kernel haveged (except that it doesn't credit
> entropy), and Willy Tarreau said "collect the small entropy where it is,
> period" today. So, too many people touched upon the topic in one day,
> and therefore I'll bite.

I'm one of the people who aren't entirely convinced by the jitter
entropy - I definitely believe it exists, I just am not necessarily
convinced about the actual entropy calculations.

So while I do think we should take things like the cycle counter into
account just because I think it's a a useful way to force some noise,
I am *not* a huge fan of the jitter entropy driver either, because of
the whole "I'm not convinced about the amount of entropy".

The whole "third order time difference" thing would make sense if the
time difference was some kind of smooth function - which it is at a
macro level.

But at a micro level, I could easily see the time difference having
some very simple pattern - say that your cycle counter isn't really
cycle-granular, and the load takes 5.33 "cycles" and you see a time
difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
there, it is 100% reliable.

At a macro level, that's a very smooth curve, and you'd say "ok, time
difference is 5.3333 (repeating)". But that's not what the jitter
entropy code does. It just does differences of differences.

And that completely non-random pattern has a first-order difference of
0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0,  and so on
forever. So the "jitter entropy" logic will assign that completely
repeatable thing entropy, because the delta difference doesn't ever go
away.

Maybe I misread it.

We used to (we still do, but we used to too) do that same third-order
delta difference ourselves for the interrupt timing entropy estimation
in add_timer_randomness(). But I think it's more valid with something
that likely has more noise (interrupt timing really _should_ be
noisy). It's not clear that the jitterentropy load really has all that
much noise.

That said, I'm _also_ not a fan of the user mode models - they happen
too late anyway for some users, and as you say, it leaves us open to
random (heh) user mode distribution choices that may be more or less
broken.

I would perhaps be willing to just put my foot down, and say "ok,
we'll solve the 'getrandom(0)' issue by just saying that if that
blocks too  much, we'll do the jitter entropy thing".

Making absolutely nobody happy, but working in practice. And maybe
encouraging the people who don't like jitter entropy to use
GRND_SECURE instead.

              Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 21:47                                             ` Linus Torvalds
@ 2019-09-19 22:23                                               ` Alexander E. Patrakov
  2019-09-19 23:44                                                 ` Alexander E. Patrakov
  2019-09-20 13:16                                                 ` Theodore Y. Ts'o
  0 siblings, 2 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-19 22:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Ahmed S. Darwish, Lennart Poettering,
	Eric W. Biederman, Michael Kerrisk, lkml, linux-ext4, linux-man


[-- Attachment #1: Type: text/plain, Size: 4336 bytes --]

20.09.2019 02:47, Linus Torvalds пишет:
> On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
> <patrakov@gmail.com> wrote:
>>
>> This already resembles in-kernel haveged (except that it doesn't credit
>> entropy), and Willy Tarreau said "collect the small entropy where it is,
>> period" today. So, too many people touched upon the topic in one day,
>> and therefore I'll bite.
> 
> I'm one of the people who aren't entirely convinced by the jitter
> entropy - I definitely believe it exists, I just am not necessarily
> convinced about the actual entropy calculations.
> 
> So while I do think we should take things like the cycle counter into
> account just because I think it's a a useful way to force some noise,
> I am *not* a huge fan of the jitter entropy driver either, because of
> the whole "I'm not convinced about the amount of entropy".
> 
> The whole "third order time difference" thing would make sense if the
> time difference was some kind of smooth function - which it is at a
> macro level.
> 
> But at a micro level, I could easily see the time difference having
> some very simple pattern - say that your cycle counter isn't really
> cycle-granular, and the load takes 5.33 "cycles" and you see a time
> difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
> there, it is 100% reliable.
> 
> At a macro level, that's a very smooth curve, and you'd say "ok, time
> difference is 5.3333 (repeating)". But that's not what the jitter
> entropy code does. It just does differences of differences.
> 
> And that completely non-random pattern has a first-order difference of
> 0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0,  and so on
> forever. So the "jitter entropy" logic will assign that completely
> repeatable thing entropy, because the delta difference doesn't ever go
> away.
> 
> Maybe I misread it.

You didn't. Let me generalize and rephrase the part of the concern that 
I agree with, in my own words:

The same code is used in cryptoapi rng, and also a userspace version 
exists. These two have been tested by the author via the "dieharder" 
tool (see the message for commit d9d67c87), so we know that on his 
machine it actually produces good-quality random bits. However, the 
in-kernel self-test is much, much weaker, and would not catch the 
situation when someone's machine is deterministic in a way that you 
describe, or something similar.

OTOH, I thought that at least part of the real entropy, if it exists, 
comes from the interference of the CPU's memory accesses with the 
refresh cycles that are clocked from an independent oscillator. That's 
why (in order to catch more of them before declaring the crng 
initialized) I have set the quality to the minimum possible that is 
guaranteed to be distinct from zero according to the fixed-point math in 
hwrng_fillfn() in drivers/char/hw_random/core.c.

> 
> We used to (we still do, but we used to too) do that same third-order
> delta difference ourselves for the interrupt timing entropy estimation
> in add_timer_randomness(). But I think it's more valid with something
> that likely has more noise (interrupt timing really _should_ be
> noisy). It's not clear that the jitterentropy load really has all that
> much noise.
> 
> That said, I'm _also_ not a fan of the user mode models - they happen
> too late anyway for some users, and as you say, it leaves us open to
> random (heh) user mode distribution choices that may be more or less
> broken.
> 
> I would perhaps be willing to just put my foot down, and say "ok,
> we'll solve the 'getrandom(0)' issue by just saying that if that
> blocks too  much, we'll do the jitter entropy thing".
> 
> Making absolutely nobody happy, but working in practice. And maybe
> encouraging the people who don't like jitter entropy to use
> GRND_SECURE instead.

I think this approach makes sense. For those who don't believe in jitter 
entropy, it changes really nothing (except a one-time delay) to Ahmed's 
first patch that makes getrandom(0) equivalent to /dev/urandom, and 
nobody so far proposed anything better that doesn't break existing 
systems. And for those who do believe in jitter entropy, this makes the 
situation as good as in OpenBSD.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 22:23                                               ` Alexander E. Patrakov
@ 2019-09-19 23:44                                                 ` Alexander E. Patrakov
  2019-09-20 13:16                                                 ` Theodore Y. Ts'o
  1 sibling, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-19 23:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Y. Ts'o, Ahmed S. Darwish, Lennart Poettering,
	Eric W. Biederman, Michael Kerrisk, lkml, linux-ext4, linux-man


[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

20.09.2019 03:23, Alexander E. Patrakov пишет:
> 20.09.2019 02:47, Linus Torvalds пишет:
>> On Thu, Sep 19, 2019 at 1:45 PM Alexander E. Patrakov
>> <patrakov@gmail.com> wrote:
>>>
>>> This already resembles in-kernel haveged (except that it doesn't credit
>>> entropy), and Willy Tarreau said "collect the small entropy where it is,
>>> period" today. So, too many people touched upon the topic in one day,
>>> and therefore I'll bite.
>>
>> I'm one of the people who aren't entirely convinced by the jitter
>> entropy - I definitely believe it exists, I just am not necessarily
>> convinced about the actual entropy calculations.
>>
>> So while I do think we should take things like the cycle counter into
>> account just because I think it's a a useful way to force some noise,
>> I am *not* a huge fan of the jitter entropy driver either, because of
>> the whole "I'm not convinced about the amount of entropy".
>>
>> The whole "third order time difference" thing would make sense if the
>> time difference was some kind of smooth function - which it is at a
>> macro level.
>>
>> But at a micro level, I could easily see the time difference having
>> some very simple pattern - say that your cycle counter isn't really
>> cycle-granular, and the load takes 5.33 "cycles" and you see a time
>> difference pattern of (5, 5, 6, 5, 5, 6, ...). No real entropy at all
>> there, it is 100% reliable.
>>
>> At a macro level, that's a very smooth curve, and you'd say "ok, time
>> difference is 5.3333 (repeating)". But that's not what the jitter
>> entropy code does. It just does differences of differences.
>>
>> And that completely non-random pattern has a first-order difference of
>> 0, 1, 1, 0, 1, 1.. and a second order of 1, 0, 1, 1, 0,  and so on
>> forever. So the "jitter entropy" logic will assign that completely
>> repeatable thing entropy, because the delta difference doesn't ever go
>> away.
>>
>> Maybe I misread it.
> 
> You didn't. Let me generalize and rephrase the part of the concern that 
> I agree with, in my own words:
> 
> The same code is used in cryptoapi rng, and also a userspace version 
> exists. These two have been tested by the author via the "dieharder" 
> tool (see the message for commit d9d67c87), so we know that on his 
> machine it actually produces good-quality random bits. However, the 
> in-kernel self-test is much, much weaker, and would not catch the 
> situation when someone's machine is deterministic in a way that you 
> describe, or something similar.

A constructive suggestion here would be to put the first few thousands 
(ok, a completely made up number) raw timing intervals through a "gzip 
compression test" in addition to the third derivative test, just based 
on what we already have in the kernel.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 15:20                                       ` Linus Torvalds
  2019-09-19 15:50                                         ` Linus Torvalds
  2019-09-19 20:04                                         ` Linus Torvalds
@ 2019-09-20 13:08                                         ` Theodore Y. Ts'o
  2 siblings, 0 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-20 13:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

On Thu, Sep 19, 2019 at 08:20:57AM -0700, Linus Torvalds wrote:
> And unlike your theoretical state extension attack, I can point you to
> black hat presentations that literally talk about using the fact that
> we delay m,ixing in the input pull hash to know what's going on:
> 
>   https://www.blackhat.com/docs/eu-14/materials/eu-14-Kedmi-Attacking-The-Linux-PRNG-On-Android-Weaknesses-In-Seeding-Of-Entropic-Pools-And-Low-Boot-Time-Entropy.pdf
> 
> That's a real attack. Based on the REAL fact that we currently have to
> use the urandom logic because the entropy-waiting one is useless, and
> in fact depends on the re-seeding happening too late.

Actually, that particular case proves my point.

In that particular attack was against Android 4.3 (Android KitKat).
In the 3.4 kernel used by KitKat, before the urandom pool is
considered initialized, 100% of the entropy from
add_interrupt_randomness() goes to the urandom pool, NOT the input
pool.  add_device_entropy() also fed the urandom pool.  And on an
Android device, it doesn't have a keyboard, mouse, or spinning HDD, so
add_timer_randomness() and add_disk_randomness() weren't a factor.

The real problem was that the Android zygote process sampled the the
urandom pool too early, and what the attack did was essentially one
where they were trying to determine the state of the pool by looking
at that sampled output of /dev/urandom.

If we make getrandom(0) work like /dev/urandom, it doesn't solve the
problem, because if you read from the entropy pool before we can get
high quality randomness, you're screwed.  The only real answers are
(a) try to get better entropy early, or (b) get userspace to wait
until it's safe to read from /dev/urandom.

Long-term, (a) is the only real way to solve the problem, and whether
you trust the bootloader, or trust the built-in hardware random number
generator (whether it's RDRAND, or some secure element in the device,
etc), we can't control userspace.  We can try to enforce userspace to
be safe by blocking, but that makes people unhappy.  We can certainly
try to influence userspace by annoying them with WARN() stack traces
in the logs, and hope they pay attention, but that's not guaranteed.

> But honestly, this isn't realistic. I can point to emails where *you*
> are  arguing against other hashing algorithms because the whole state
> extension attack simply isn't realistic.

The blackhat presentation which you pointed at *was* actually a state
extension attack.  When I argued against state extension attacks, that
was in cases where people worried about recovery after the pool is
exposed --- and my argument was if you can read from kernel memory
enough to grab the pool state, you have other problems.  Your
observation that if you can install malware that runs at system
initscript/userspace bootup time, you probably have other problems, is
a similar argument, and it's a fair one.  But it *has* happened, as
the blackhat paper demonstrates.

My thinking at the time is that if people are reading from the CRNG
before it's initialized (which could only happen via /dev/urandom),
that was kind of a disaster anyway, so resetting the initialization
count would at least get us to the point where when the CRNG *was*
declared to be initialized, that was something could state with high
confidence that we were in a secure state.  

> > I'm happy this proposed is not changing the behavior of getrandom(0).
> > Why not just remap 0 to GRND_EXPLICIT | GRND_WAIT_ENTROPY, though?  It
> > will have the same effect, and it's make it clear what we're doing.
> 
> Have you you not followed the whole discussion? Didn't you read the comment?
> 
> People use "getrandom(0)" not because they want secure randomness, but
> because that's the default.
> 
> And we *will* do something about it. This patch didn't, because I want
> to be able to backport it to stable, so that everybody is happier with
> saying "ok, I'll use the new getrandom(GRND_INSECURE)".
> 
> Because getrandom(0) will NOT be the the same as GRND_EXPLICIT |
> GRND_WAIT_ENTROPY.

No, I did read the comment.  And I agree that at the moment, that yes,
it is ambiguous.  What I really care about though, is the HUGE
DEPLOYED BASE which is using getrandom(0) *because* they are
generating cryptographic keys, and we will be changing things out from
under them.

We agree that we don't want to change things out from under the stable
users.  I'm pleading that we not screw over existing userspace --- at
least not right away.  Give them *time* to release update their source
bases to use getrandom(GRND_SECURE).  So what if we make getrandom(0)
print a ratelimited KERN_ERR deprecation notice that program should
switch to either specify either GRND_INSECURE or GRND_SECURE, and not
change the current semantics of getrandom(0) for some period of time?
Say, a year.  Or even six months.

If that's not good enough, what if we change getrandom(0) immediately,
but only for those platforms which have a functional
arch_get_random_long() or random_get_entropy()?  That gets us the x86
platform, which is where pretty much all of the users who have
complained have been coming from.  For the IOT/embedded user cases,
blocking is actually a feature, because the problem will be caught
while the product is in development, when the userspace code can be
fixed.

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 15:50                                         ` Linus Torvalds
@ 2019-09-20 13:13                                           ` Theodore Y. Ts'o
  0 siblings, 0 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-20 13:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

On Thu, Sep 19, 2019 at 08:50:15AM -0700, Linus Torvalds wrote:
> .. btw, instead of bad workarounds for a theoretical attack, here's
> something that should add actual *practical* real value: use the time
> of day (whether from an RTC device, or from ntp) to add noise to the
> random pool.

Actally, we used to seed the pool from the RTC device --- that was the
case in the 3.4 kernel referenced by the Blackhat attack, and it
didn't stop the researchers.  In later kernels, we moved up when
rand_initialized() got called to before time_init(), so
init_std_data() was no longer seeding the pool from the RTC clock.

That being said, adding calls to add_device_randomness() to
do_settimeofday64() and timekeeping_inject_offset() is an obviously
good thing to do.  I'll prepare a separate patch for the random.git
tree to do that.

					- Ted


^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 22:23                                               ` Alexander E. Patrakov
  2019-09-19 23:44                                                 ` Alexander E. Patrakov
@ 2019-09-20 13:16                                                 ` Theodore Y. Ts'o
  1 sibling, 0 replies; 211+ messages in thread
From: Theodore Y. Ts'o @ 2019-09-20 13:16 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Linus Torvalds, Ahmed S. Darwish, Lennart Poettering,
	Eric W. Biederman, Michael Kerrisk, lkml, linux-ext4, linux-man

On Fri, Sep 20, 2019 at 03:23:58AM +0500, Alexander E. Patrakov wrote:
> OTOH, I thought that at least part of the real entropy, if it exists, comes
> from the interference of the CPU's memory accesses with the refresh cycles
> that are clocked from an independent oscillator.

That's not a valid assumption; on *many* systems, there is only a
single master oscillator.  It saves on power, parts cost, reduces the
amount of RF interference, etc.

						- Ted

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-18 23:57                                   ` Linus Torvalds
  2019-09-19 14:34                                     ` Theodore Y. Ts'o
@ 2019-09-20 13:46                                     ` Ahmed S. Darwish
  2019-09-20 14:33                                       ` Andy Lutomirski
  2019-09-20 17:26                                       ` Willy Tarreau
  2019-09-26 20:42                                     ` [PATCH v5 0/1] random: getrandom(2): warn on large CRNG waits, introduce new flags Ahmed S. Darwish
  2 siblings, 2 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-20 13:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, Willy Tarreau,
	Matthew Garrett, lkml, linux-ext4, linux-api, linux-man

Hi,

On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> >
> > Since Linux v3.17, getrandom(2) has been created as a new and more
> > secure interface for pseudorandom data requests.  It attempted to
> > solve three problems, as compared to /dev/urandom:
  > 
> I don't think your patch is really _wrong_, but I think it's silly to
> introduce a new system call, when we have 30 bits left in the flags of
> the old one, and the old system call checked them.
> 
> So it's much simpler and more straightforward to  just introduce a
> single new bit #2 that says "I actually know what I'm doing, and I'm
> explicitly asking for secure/insecure random data".
> 
> And then say that the existing bit #1 just means "I want to wait for entropy".
> 
> So then you end up with this:
> 
>     /*
>      * Flags for getrandom(2)
>      *
>      * GRND_NONBLOCK    Don't block and return EAGAIN instead
>      * GRND_WAIT_ENTROPY        Explicitly wait for entropy
>      * GRND_EXPLICIT    Make it clear you know what you are doing
>      */
>     #define GRND_NONBLOCK               0x0001
>     #define GRND_WAIT_ENTROPY   0x0002
>     #define GRND_EXPLICIT               0x0004
> 
>     #define GRND_SECURE (GRND_EXPLICIT | GRND_WAIT_ENTROPY)
>     #define GRND_INSECURE       (GRND_EXPLICIT | GRND_NONBLOCK)
> 
>     /* Nobody wants /dev/random behavior, nobody should use it */
>     #define GRND_RANDOM 0x0002
> 
> which is actually fairly easy to understand. So now we have three
> bits, and the values are:
> 
>  000  - ambiguous "secure or just lazy/ignorant"
>  001 - -EAGAIN or secure
>  010 - blocking /dev/random DO NOT USE
>  011 - nonblocking /dev/random DO NOT USE
>  100 - nonsense, returns -EINVAL
>  101 - /dev/urandom without warnings
>  110 - blocking secure
>  111 - -EAGAIN or secure
>

Hmmm, the point of the new syscall was **exactly** to avoid the 2^3
combinations above, and to provide developers only two, sane and easy,
options:

  - GRND2_INSECURE
  - GRND2_SECURE_UNBOUNDED_INITIAL_WAIT

You *must* pick one of these, and that's it. (!)

Then the proposed getrandom_wait(7) manpage, also mentioned in the V4
patch WARN message, would provide a big rationale, and encourage
everyone to use the new getrandom2(2) syscall instead.

But yeah, maybe we should add the extra flags to the old getrandom()
instead, and let glibc implement a getrandom_safe(3) wrapper only
with the sane options available.

Problem is, glibc is still *really* slow in adopting linux syscall
wrappers, so I'm not optimistic about that...

I still see the new system call as the sanest path, even provided
the cost of a new syscall number..

@Linus, @Ted:  Final thoughts?

> and people would be encouraged to use one of these three:
> 
>  - GRND_INSECURE
>  - GRND_SECURE
>  - GRND_SECURE | GRND_NONBLOCK
> 
> all of which actually make sense, and none of which have any
> ambiguity. And while "GRND_INSECURE | GRND_NONBLOCK" works, it's
> exactly the same as just plain GRND_INSECURE - the point is that it
> doesn't block for entropy anyway, so non-blocking makes no different.
>

[...]

> 
> There is *one* other small semantic change: The old code did
> urandom_read() which added warnings, but each warning also _reset_ the
> crng_init_cnt. Until it decided not to warn any more, at which point
> it also stops that resetting of crng_init_cnt.
> 
> And that reset of crng_init_cnt, btw, is some cray cray.
> 
> It's basically a "we used up entropy" thing, which is very
> questionable to begin with as the whole discussion has shown, but
> since it stops doing it after 10 cases, it's not even good security
> assuming the "use up entropy" case makes sense in the first place.
> 
> So I didn't copy that insanity either. And I'm wondering if removing
> it from /dev/urandom might also end up helping Ahmed's case of getting
> entropy earlier, when we don't reset the counter.
>

Yeah, noticed that, but I've learned not to change crypto or
speculative-execution code even if the changes "just look the same" at
first glance ;-)

(out of curiosity, I'll do a quick test with this CRNG entropy reset
part removed. Maybe it was indeed part of the problem..)

> But other than those two details, none of the existing semantics
> changed, we just added the three actually _sane_ cases without any
> ambiguity.
> 
> In particular, this still leaves the semantics of that nasty
> "getrandom(0)" as the same "blocking urandom" that it currently is.
> But now it's a separate case, and we can make that perhaps do the
> timeout, or at least the warning.
>

Yeah, I would propose to keep the V4-submitted "timeout then WARN"
logic. This alone will give user-space / distributions time to adapt.

For example, it was interesting that even the 0day bot had limited
entropy on boot (virtio-rng / TRUST_CPU not enabled):

    https://lkml.kernel.org/r/20190920005120.GP15734@shao2-debian

If user-space didn't get its act together, then the other extreme
measures can be implemented later (the getrandom() length test, using
jitter as a credited kernel entropy source, etc., etc.)

> And the new cases are defined to *not* warn. In particular,
> GRND_INSECURE very much does *not* warn about early urandom access
> when crng isn't ready. Because the whole point of that new mode is
> that the user knows it isn't secure.
>
> So that should make getrandom(GRND_INSECURE) palatable to the systemd
> kind of use that wanted to avoid the pointless kernel warning.
>

Yup, that's what was in the submitted V4 patch too. The caller
explicitly asked for "insecure", so they know what they're doing.

getrandom2(2) never prints any kernel message.

> And we could mark this for stable and try to get it backported so that
> it will have better coverage, and encourage people to use the new sane
> _explicit_ waiting (or not) for entropy.
>

ACK. I'll wait for an answer to the "Final thoughts?" question above,
send a V5 with CC:stable, then disappear from this thread ;-)

Thanks a lot everyone!

--
Ahmed Darwish

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 13:46                                     ` Ahmed S. Darwish
@ 2019-09-20 14:33                                       ` Andy Lutomirski
  2019-09-20 16:29                                         ` Linus Torvalds
  2019-09-20 17:26                                       ` Willy Tarreau
  1 sibling, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 14:33 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Willy Tarreau, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 6:46 AM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
>
> Hi,
>
> On Wed, Sep 18, 2019 at 04:57:58PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 18, 2019 at 2:17 PM Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> > >
> > > Since Linux v3.17, getrandom(2) has been created as a new and more
> > > secure interface for pseudorandom data requests.  It attempted to
> > > solve three problems, as compared to /dev/urandom:
>   >
> > I don't think your patch is really _wrong_, but I think it's silly to
> > introduce a new system call, when we have 30 bits left in the flags of
> > the old one, and the old system call checked them.
> >
> > So it's much simpler and more straightforward to  just introduce a
> > single new bit #2 that says "I actually know what I'm doing, and I'm
> > explicitly asking for secure/insecure random data".
> >
> > And then say that the existing bit #1 just means "I want to wait for entropy".
> >
> > So then you end up with this:
> >
> >     /*
> >      * Flags for getrandom(2)
> >      *
> >      * GRND_NONBLOCK    Don't block and return EAGAIN instead
> >      * GRND_WAIT_ENTROPY        Explicitly wait for entropy
> >      * GRND_EXPLICIT    Make it clear you know what you are doing
> >      */
> >     #define GRND_NONBLOCK               0x0001
> >     #define GRND_WAIT_ENTROPY   0x0002
> >     #define GRND_EXPLICIT               0x0004

What is this GRND_EXPLICIT thing?

A few weeks ago, I sent a whole series to address this, and I
obviously didn't cc enough people.  I'll resend a rebased version
today.  Meanwhile, some comments on this whole mess:

As I think everyone mostly agrees in this whole thread, getrandom()
can't just magically start returning non-random results.  That would
be a big problem.

Linus, I disagree that blocking while waiting for randomness is an
error.  Sometimes you want to generate a key, you want to finish as
quickly as possible, and you don't want to be in the business of
fiddling with the setup of the kernel RNG.  I would argue that *most*
crypto applications are in this category.  I think that the kernel
should, instead, handle this mess itself.  As a first pass, it could
be as simple as noticing that someone is blocking on randomness and
kicking off a thread that does some randomish reads to the rootfs.
This would roughly simulate the old behavior in which an ext4 rootfs
did more IO than necessary.  A fancier version would, as discussed in
this thread, do more clever things.

(As an aside, I am not a fan of xoring or adding stuff to the CRNG
state.  We should just use an actual crypto primitive for this.
Accumulate the state in a buffer and SHA-512 it.  Or use something
like the Keccak duplex sponge.  But this is a discussion for another
day.)

So I'm going to resend my series.  You can all fight over whether the
patch that actually goes in should be based on my series or based on
this patch.

--Andy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 14:33                                       ` Andy Lutomirski
@ 2019-09-20 16:29                                         ` Linus Torvalds
  2019-09-20 17:52                                           ` Andy Lutomirski
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-20 16:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Willy Tarreau, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 7:34 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> What is this GRND_EXPLICIT thing?

Your own email gives the explanation:

> Linus, I disagree that blocking while waiting for randomness is an
> error.  Sometimes you want to generate a key

That's *exactly* why GRND_EXPLICIT needs to be done regardless.

The keyword there is "Sometimes".

But people currently use "getrandom(0)" when they DO NOT want a key,
they just want some miscellaneous random numbers for some totally
non-security-related reason.

And that will continue. Exactly because the people who do not want a
key by definition aren't thinking about it very hard.

So the interface was very much mis-designed from the get-go. It was
designed purely for key people, even though generating keys is by no
means the most common reason for wanting a block of "random" numbers.

So GRND_EXPLICIT is there very much to make sure people who want true
secure keys will say so, and five years from now we will not have the
confusion between "Oh, I wasn't thinking about bootup". Because at a
minimum, in the near future getrandom(0) will warn about the
ambiguity. Or it will use some questionable jitter entropy that some
real key users will look at sideways and go "I don't want that".

This is an ABI design issue. The old ABI was fundamentally misdesigned
and actively encouraged the current situation of mixing secure and
insecure callers for that getrandom(0).

And it's entirely orthogonal to _any_ actual technical change we will
do (like removing the old GRND_RANDOM behavior entirely, which is
insane for other reasons and nobody ever wanted or likely used).

            Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 13:46                                     ` Ahmed S. Darwish
  2019-09-20 14:33                                       ` Andy Lutomirski
@ 2019-09-20 17:26                                       ` Willy Tarreau
  2019-09-20 17:56                                         ` Ahmed S. Darwish
  1 sibling, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-20 17:26 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Matthew Garrett, lkml, linux-ext4, linux-api, linux-man

Hi Ahmed,

On Fri, Sep 20, 2019 at 03:46:09PM +0200, Ahmed S. Darwish wrote:
> Problem is, glibc is still *really* slow in adopting linux syscall
> wrappers, so I'm not optimistic about that...
>
> I still see the new system call as the sanest path, even provided
> the cost of a new syscall number..

New syscalls are always a pain to deal with in userland, because when
they are introduced, everyone wants them long before they're available
in glibc. So userland has to define NR_xxx for each supported arch and
to perform the call itself.

With flags adoption is instantaneous. Just #ifndef/#define, check if
the flag is supported and that's done. The only valid reason for a new
syscall is when the API changes (e.g. one extra arg, a la accept4()),
which doesn't seem to be the case here. Otherwise please by all means
avoid this in general.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 16:29                                         ` Linus Torvalds
@ 2019-09-20 17:52                                           ` Andy Lutomirski
  2019-09-20 18:09                                             ` Linus Torvalds
                                                               ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 17:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

On Fri, Sep 20, 2019 at 9:30 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Sep 20, 2019 at 7:34 AM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > What is this GRND_EXPLICIT thing?
>
> Your own email gives the explanation:
>
> > Linus, I disagree that blocking while waiting for randomness is an
> > error.  Sometimes you want to generate a key
>
> That's *exactly* why GRND_EXPLICIT needs to be done regardless.
>
> The keyword there is "Sometimes".
>
> But people currently use "getrandom(0)" when they DO NOT want a key,
> they just want some miscellaneous random numbers for some totally
> non-security-related reason.
>
> And that will continue. Exactly because the people who do not want a
> key by definition aren't thinking about it very hard.

I fully agree that this is a problem.  It's a problem we brought on
ourselves because we screwed up the ABI from the beginning.  The
question is what to do about it that doesn't cause its own set of
nasty problems.

> So GRND_EXPLICIT is there very much to make sure people who want true
> secure keys will say so, and five years from now we will not have the
> confusion between "Oh, I wasn't thinking about bootup". Because at a
> minimum, in the near future getrandom(0) will warn about the
> ambiguity. Or it will use some questionable jitter entropy that some
> real key users will look at sideways and go "I don't want that".

There are programs that call getrandom(0) *today* that expect secure
output.  openssl does a horrible dance in which it calls getentropy()
if available and falls back to syscall(__NR_getrandom, buf, buflen, 0)
otherwise.  We can't break this use case.  Changing the semantics of
getrandom(0) out from under them seems like the worst kind of ABI
break -- existing applications will *appear* to continue working but
will, in fact, become insecure.

IMO, from the beginning, we should have done this:

GRND_INSECURE: insecure.  always works.

GRND_SECURE_BLOCKING: does exactly what it says.

0: -EINVAL.

Using it correctly would be obvious.  Something like GRND_EXPLICIT
would be a head-scratcher: people would have to look at the man page
and actually think about it, and it's still easy to get wrong:

getrandom(..., GRND_EXPLICIT): just fscking give me a number.  it
seems to work and it shuts up the warning

And we're back to square one.


I think that, given existing software, we should make two or three
changes to fix the basic problems here:

1. Add GRND_INSECURE: at least let new applications do the right thing
going forward.

2. Fix what is arguably a straight up kernel bug, not even an ABI
issue: when a user program is blocking in getrandom(..., 0), the
kernel happily sits there doing absolutely nothing and deadlocks the
system as a result.  This IMO isn't an ABI issue -- it's an
implementation problem.  How about we make getrandom() (probably
actually wait_for_random_bytes()) do something useful to try to seed
the RNG if the system is otherwise not doing IO.

3. Optionally, entirely in user code: Get glibc to add new *library*
functions: getentropy_secure_blocking() and getentropy_insecure() or
whatever they want to call them.  Deprecate getentropy().

I think #2 is critical.  Right now, suppose someone has a system that
neets to do a secure network request (a la Red Hat's Clevis).  I have
no idea what Clevis actually does, but it wouldn't be particularly
crazy to do a DH exchange or sign with an EC key to ask some network
server to help unlock a dm-crypt volume.  If the system does this at
boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
because it NEEDS a secure random number.  No about of ABI fiddling
will change this.  The kernel should *work* in this case rather than
deadlocking.

--Andy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 17:26                                       ` Willy Tarreau
@ 2019-09-20 17:56                                         ` Ahmed S. Darwish
  0 siblings, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-20 17:56 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Matthew Garrett, lkml, linux-ext4, linux-api, linux-man

On Fri, Sep 20, 2019 at 07:26:09PM +0200, Willy Tarreau wrote:
> Hi Ahmed,
> 
> On Fri, Sep 20, 2019 at 03:46:09PM +0200, Ahmed S. Darwish wrote:
> > Problem is, glibc is still *really* slow in adopting linux syscall
> > wrappers, so I'm not optimistic about that...
> >
> > I still see the new system call as the sanest path, even provided
> > the cost of a new syscall number..
> 
> New syscalls are always a pain to deal with in userland, because when
> they are introduced, everyone wants them long before they're available
> in glibc. So userland has to define NR_xxx for each supported arch and
> to perform the call itself.
> 
> With flags adoption is instantaneous. Just #ifndef/#define, check if
> the flag is supported and that's done. The only valid reason for a new
> syscall is when the API changes (e.g. one extra arg, a la accept4()),
> which doesn't seem to be the case here. Otherwise please by all means
> avoid this in general.
> 

I see. Thanks a lot for the explanation above :)

--
Ahmed Darwish

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 17:52                                           ` Andy Lutomirski
@ 2019-09-20 18:09                                             ` Linus Torvalds
  2019-09-20 18:16                                               ` Willy Tarreau
                                                                 ` (2 more replies)
  2019-09-20 18:12                                             ` Willy Tarreau
  2019-09-20 18:15                                             ` Alexander E. Patrakov
  2 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-20 18:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Willy Tarreau, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 10:52 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> IMO, from the beginning, we should have done this:
>
> GRND_INSECURE: insecure.  always works.
>
> GRND_SECURE_BLOCKING: does exactly what it says.
>
> 0: -EINVAL.

Violently agreed. And that's kind of what the GRND_EXPLICIT is really
aiming for.

However, it's worth noting that nobody should ever use GRND_EXPLICIT
directly. That's just the name for the bit. The actual users would use
GRND_INSECURE or GRND_SECURE.

And yes, maybe it's worth making the name be GRND_SECURE_BLOCKING just
to make people see what the big deal is.

In the meantime, we need that new bit just to be able to create the
new semantics eventually. With a warning to nudge people in the right
direction.

We may never be able to return -EINVAL, but we can add the pr_notice()
to discourage people from using it.

And yes, we'll have to block - at least for a time - to get some
entropy. But at some point we either start making entropy up, or we
say "0 means jitter-entropy for ten seconds".

That will _work_, but it will also make the security-people nervous,
which is just one more hint that they should move to
GRND_SECURE[_BLOCKING].

> getrandom(..., GRND_EXPLICIT): just fscking give me a number.  it
> seems to work and it shuts up the warning
>
> And we're back to square one.

Actually, you didn't read the GRND_INSECURE patch, did you.

getrandom(GRND_EXPLICIT) on its own returns -EINVAL.

Because yes, I thought about it, and yes, I agree that it's the same
as the old 0.

So GRND_EXPLICIT is a bit that basically means "I am explicit about
what behavior I want". But part of that is that you need to _state_
the behavior too.

So:

 - GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)

   As in "I explicitly ask you not to just not ever block": urandom

 - GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)

   As in "I explicitly ask you for those secure random numbers"

 - GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)

   As in "I want explicitly secure random numbers, but return -EAGAIN
if that would block".

Which are the three sane behaviors (that last one is useful for the "I
can try to generate entropy if you don't have any" case. I'm not sure
anybody will do it, but it definitely conceptually makes sense).

And I agree that your naming is better.

I had it as just "GRND_SECURE" for the blocking version, and
"GRND_SECURE | GRND_NONBLOCK" for the "secure but return EAGAIN if you
would need to block for entropy" version.

But explicitly stating the blockingness in the name makes it clearer
to the people who just want GRND_INSECURE, and makes them realize that
they don't want the blocking version.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 17:52                                           ` Andy Lutomirski
  2019-09-20 18:09                                             ` Linus Torvalds
@ 2019-09-20 18:12                                             ` Willy Tarreau
  2019-09-20 19:22                                               ` Andy Lutomirski
  2019-09-20 18:15                                             ` Alexander E. Patrakov
  2 siblings, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-20 18:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

Hi Andy,

On Fri, Sep 20, 2019 at 10:52:30AM -0700, Andy Lutomirski wrote:
> 2. Fix what is arguably a straight up kernel bug, not even an ABI
> issue: when a user program is blocking in getrandom(..., 0), the
> kernel happily sits there doing absolutely nothing and deadlocks the
> system as a result.  This IMO isn't an ABI issue -- it's an
> implementation problem.  How about we make getrandom() (probably
> actually wait_for_random_bytes()) do something useful to try to seed
> the RNG if the system is otherwise not doing IO.

I thought about it as well with my old MSDOS reflexes, but here I
doubt we can do a lot. It seems fishy to me to start to fiddle with
various drivers from within a getrandom() syscall, we could sometimes
even end up waiting even longer because one device is already locked,
and when we have access there there's not much we can do without
risking to cause some harm. On desktop systems you have a bit more
choice than on headless systems (blink keyboard leds and time the
interrupts, run some disk accesses when there's still a disk, get a
copy of the last buffer of the audio input and/or output, turn on
the microphone and/or webcam, and collect some data). Many of them
cannot always be used. We could do some more portable stuff like scan
and hash the totality of the RAM. But that's all quite bad and
unreliable and at this point it's better to tell userland "here's
what I could get for you, if you want better, do it yourself" and the
userland can then ask the user "dear user, I really need valid entropy
this time to generate your GPG key, please type frantically on this
keyboard". And it will be more reliable this way in my opinion.

My analysis of the problem precisely lies in the fact that we've
always considered that the kernel had to provide randoms for any
use case and had to cover the most difficult cases and imposed
their constraints on simplest ones. Better let the application
decide.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 17:52                                           ` Andy Lutomirski
  2019-09-20 18:09                                             ` Linus Torvalds
  2019-09-20 18:12                                             ` Willy Tarreau
@ 2019-09-20 18:15                                             ` Alexander E. Patrakov
  2019-09-20 18:29                                               ` Andy Lutomirski
  2 siblings, 1 reply; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-20 18:15 UTC (permalink / raw)
  To: Andy Lutomirski, Linus Torvalds
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Michael Kerrisk, Willy Tarreau,
	Matthew Garrett, lkml, Ext4 Developers List, Linux API,
	linux-man


[-- Attachment #1: Type: text/plain, Size: 2133 bytes --]

20.09.2019 22:52, Andy Lutomirski пишет:
> I think that, given existing software, we should make two or three
> changes to fix the basic problems here:
> 
> 1. Add GRND_INSECURE: at least let new applications do the right thing
> going forward.
> 
> 2. Fix what is arguably a straight up kernel bug, not even an ABI
> issue: when a user program is blocking in getrandom(..., 0), the
> kernel happily sits there doing absolutely nothing and deadlocks the
> system as a result.  This IMO isn't an ABI issue -- it's an
> implementation problem.  How about we make getrandom() (probably
> actually wait_for_random_bytes()) do something useful to try to seed
> the RNG if the system is otherwise not doing IO.
> 
> 3. Optionally, entirely in user code: Get glibc to add new *library*
> functions: getentropy_secure_blocking() and getentropy_insecure() or
> whatever they want to call them.  Deprecate getentropy().
> 
> I think #2 is critical.  Right now, suppose someone has a system that
> neets to do a secure network request (a la Red Hat's Clevis).  I have
> no idea what Clevis actually does, but it wouldn't be particularly
> crazy to do a DH exchange or sign with an EC key to ask some network
> server to help unlock a dm-crypt volume.  If the system does this at
> boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
> because it NEEDS a secure random number.  No about of ABI fiddling
> will change this.  The kernel should *work* in this case rather than
> deadlocking.

Let me express a little bit of disagreement with the logic here.

I do agree that #2 is critical, and the Clevis use case is a perfect 
example why it is important. I doubt that it is solvable without 
trusting jitter entropy, or without provoking a dummy read on a random 
block device, just for timings, or maybe some other interaction with the 
external world - but Willy already said "it seems fishy". However, _if_ 
it is solved, then we don't need GRND_INSECURE, because solving #2 is 
equivalent to magically making secure random numbers always available.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 18:09                                             ` Linus Torvalds
@ 2019-09-20 18:16                                               ` Willy Tarreau
  2019-09-20 19:12                                               ` Andy Lutomirski
  2019-09-21  6:07                                               ` Florian Weimer
  2 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-20 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 11:09:53AM -0700, Linus Torvalds wrote:
(...)
> So:
> 
>  - GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)
> 
>    As in "I explicitly ask you not to just not ever block": urandom
> 
>  - GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)
> 
>    As in "I explicitly ask you for those secure random numbers"
> 
>  - GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)
> 
>    As in "I want explicitly secure random numbers, but return -EAGAIN
> if that would block".
> 
> Which are the three sane behaviors (that last one is useful for the "I
> can try to generate entropy if you don't have any" case. I'm not sure
> anybody will do it, but it definitely conceptually makes sense).
> 
> And I agree that your naming is better.
> 
> I had it as just "GRND_SECURE" for the blocking version, and
> "GRND_SECURE | GRND_NONBLOCK" for the "secure but return EAGAIN if you
> would need to block for entropy" version.
> 
> But explicitly stating the blockingness in the name makes it clearer
> to the people who just want GRND_INSECURE, and makes them realize that
> they don't want the blocking version.

I really like it this way. Explicit and full control for the application
plus reasonable backwards compatibility, it sounds pretty good.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 18:15                                             ` Alexander E. Patrakov
@ 2019-09-20 18:29                                               ` Andy Lutomirski
  0 siblings, 0 replies; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 18:29 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Andy Lutomirski, Linus Torvalds, Ahmed S. Darwish,
	Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man



> On Sep 20, 2019, at 11:15 AM, Alexander E. Patrakov <patrakov@gmail.com> wrote:
> 
> 20.09.2019 22:52, Andy Lutomirski пишет:
>> I think that, given existing software, we should make two or three
>> changes to fix the basic problems here:
>> 1. Add GRND_INSECURE: at least let new applications do the right thing
>> going forward.
>> 2. Fix what is arguably a straight up kernel bug, not even an ABI
>> issue: when a user program is blocking in getrandom(..., 0), the
>> kernel happily sits there doing absolutely nothing and deadlocks the
>> system as a result.  This IMO isn't an ABI issue -- it's an
>> implementation problem.  How about we make getrandom() (probably
>> actually wait_for_random_bytes()) do something useful to try to seed
>> the RNG if the system is otherwise not doing IO.
>> 3. Optionally, entirely in user code: Get glibc to add new *library*
>> functions: getentropy_secure_blocking() and getentropy_insecure() or
>> whatever they want to call them.  Deprecate getentropy().
>> I think #2 is critical.  Right now, suppose someone has a system that
>> neets to do a secure network request (a la Red Hat's Clevis).  I have
>> no idea what Clevis actually does, but it wouldn't be particularly
>> crazy to do a DH exchange or sign with an EC key to ask some network
>> server to help unlock a dm-crypt volume.  If the system does this at
>> boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
>> because it NEEDS a secure random number.  No about of ABI fiddling
>> will change this.  The kernel should *work* in this case rather than
>> deadlocking.
> 
> Let me express a little bit of disagreement with the logic here.
> 
> I do agree that #2 is critical, and the Clevis use case is a perfect example why it is important. I doubt that it is solvable without trusting jitter entropy, or without provoking a dummy read on a random block device, just for timings, or maybe some other interaction with the external world - but Willy already said "it seems fishy". However, _if_ it is solved, then we don't need GRND_INSECURE, because solving #2 is equivalent to magically making secure random numbers always available.
> 
> 

I beg to differ. There is a big difference between “do your best *right now*” and “give me a real secure result in a vaguely timely manner”.

For example, the former is useful for ASLR or hash table randomization. The latter is not.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 18:09                                             ` Linus Torvalds
  2019-09-20 18:16                                               ` Willy Tarreau
@ 2019-09-20 19:12                                               ` Andy Lutomirski
  2019-09-20 19:51                                                 ` Linus Torvalds
  2019-09-21  6:07                                               ` Florian Weimer
  2 siblings, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 19:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

> On Sep 20, 2019, at 11:10 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Fri, Sep 20, 2019 at 10:52 AM Andy Lutomirski <luto@kernel.org> wrote:
>>
>> IMO, from the beginning, we should have done this:
>>
>> GRND_INSECURE: insecure.  always works.
>>
>> GRND_SECURE_BLOCKING: does exactly what it says.
>>
>> 0: -EINVAL.
>
> Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> aiming for.
>
> However, it's worth noting that nobody should ever use GRND_EXPLICIT
> directly. That's just the name for the bit. The actual users would use
> GRND_INSECURE or GRND_SECURE.
>
> And yes, maybe it's worth making the name be GRND_SECURE_BLOCKING just
> to make people see what the big deal is.
>
> In the meantime, we need that new bit just to be able to create the
> new semantics eventually. With a warning to nudge people in the right
> direction.
>
> We may never be able to return -EINVAL, but we can add the pr_notice()
> to discourage people from using it.
>

The problem is that new programs will have to try the new flag value
and, if it returns -EINVAL, fall back to 0.  This isn't so great.

> And yes, we'll have to block - at least for a time - to get some
> entropy. But at some point we either start making entropy up, or we
> say "0 means jitter-entropy for ten seconds".
>
> That will _work_, but it will also make the security-people nervous,
> which is just one more hint that they should move to
> GRND_SECURE[_BLOCKING].

Wait, are you suggesting that 0 means invoke jitter-entropy or
whatever and GRND_SECURE_BLOCKING means not wait forever and deadlock?
 That's no good -- people will want to continue using 0 because the
behavior is better. My point here is that asking for secure random
numbers isn’t some legacy oddity — it’s genuinely necessary. The
kernel should do whatever it needs to in order to make it work.  We
really don’t want a situation where 0 means get me secure random
numbers reliably but spam the logs and GRND_SECURE_BLOCKING means
don’t spam the logs but risk deadlocking. This will encourage people
to pass 0 to get the improved behavior.

> So GRND_EXPLICIT is a bit that basically means "I am explicit about
> what behavior I want". But part of that is that you need to _state_
> the behavior too.
>
> So:
>
> - GRND_INSECURE is (GRND_EXPLICIT | GRND_NONBLOCK)
>
>   As in "I explicitly ask you not to just not ever block": urandom

IMO this is confusing.  The GRND_RANDOM flag was IMO a mistake and
should just be retired.  Let's enumerate useful cases and then give
them sane values.

>
> - GRND_SECURE_BLOCKING is (GRND_EXPLICIT | GRND_RANDOM)
>
>   As in "I explicitly ask you for those secure random numbers"
>
> - GRND_SECURE_NONBLOCKING is (GRND_EXPLICIT | GRND_RANDOM | GRND_NONBLOCK)
>
>   As in "I want explicitly secure random numbers, but return -EAGAIN
> if that would block".
>
> Which are the three sane behaviors (that last one is useful for the "I
> can try to generate entropy if you don't have any" case. I'm not sure
> anybody will do it, but it definitely conceptually makes sense).
>
> And I agree that your naming is better.

I think this is the complete list of "good" behaviors for new programs:

"insecure": always works, never warns.

"secure, blocking": always returns *eventually* with secure output,
i.e., does something to avoid deadlocks

"secure, nonblocking" returns secure output immediately or returns -EAGAIN.

And the only real question is how to map existing users to these
semantics.  I see two sensible choices:

1. 0 means "secure, blocking". I think this is not what we'd do if we
could go back in time and chage the ABI from day 1, but I think it's
actually good enough.  As long as this mode won't deadlock, it's not
*that* bad if programs are using it when they wanted "insecure".

2. 0 means "secure, blocking, but warn".  Some new value means
"secure, blocking, don't warn".  The problem is that new applications
will have to fall back to 0 to continue supporting old kernels.

I briefly thought that maybe GRND_RANDOM would be a reasonable choice
for "secure, blocking, don't warn", but the effect on new programs on
old kernels will be unfortunate.

I'm willing to go along with #2 if you like it better than #1, and
I'll update my patches accordingly, but I prefer #1.

I do think we should make all the ABI changes that we want to make all
in one release.  Let's not make programs think about their behavior on
more versions than necessary.  So I'd like to get rid of the current
/dev/random semantics, add "insecure" mode, and do whatever deadlock
avoidance scheme we settle on in a single release.

--Andy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 18:12                                             ` Willy Tarreau
@ 2019-09-20 19:22                                               ` Andy Lutomirski
  2019-09-20 19:37                                                 ` Willy Tarreau
  2019-09-20 20:02                                                 ` Linus Torvalds
  0 siblings, 2 replies; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 19:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andy Lutomirski, Linus Torvalds, Ahmed S. Darwish,
	Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

On Fri, Sep 20, 2019 at 11:12 AM Willy Tarreau <w@1wt.eu> wrote:
>
> Hi Andy,
>
> On Fri, Sep 20, 2019 at 10:52:30AM -0700, Andy Lutomirski wrote:
> > 2. Fix what is arguably a straight up kernel bug, not even an ABI
> > issue: when a user program is blocking in getrandom(..., 0), the
> > kernel happily sits there doing absolutely nothing and deadlocks the
> > system as a result.  This IMO isn't an ABI issue -- it's an
> > implementation problem.  How about we make getrandom() (probably
> > actually wait_for_random_bytes()) do something useful to try to seed
> > the RNG if the system is otherwise not doing IO.
>
> I thought about it as well with my old MSDOS reflexes, but here I
> doubt we can do a lot. It seems fishy to me to start to fiddle with
> various drivers from within a getrandom() syscall, we could sometimes
> even end up waiting even longer because one device is already locked,
> and when we have access there there's not much we can do without
> risking to cause some harm. On desktop systems you have a bit more
> choice than on headless systems (blink keyboard leds and time the
> interrupts, run some disk accesses when there's still a disk, get a
> copy of the last buffer of the audio input and/or output, turn on
> the microphone and/or webcam, and collect some data). Many of them
> cannot always be used. We could do some more portable stuff like scan
> and hash the totality of the RAM. But that's all quite bad and
> unreliable and at this point it's better to tell userland "here's
> what I could get for you, if you want better, do it yourself" and the
> userland can then ask the user "dear user, I really need valid entropy
> this time to generate your GPG key, please type frantically on this
> keyboard". And it will be more reliable this way in my opinion.

Perhaps userland could register a helper that takes over and does
something better?  But I think the kernel really should do something
vaguely reasonable all by itself.  If nothing else, we want the ext4
patch that provoked this whole discussion to be applied, which means
that we need to unbreak userspace somehow, and returning garbage it to
is not a good choice.

Here are some possible approaches that come to mind:

int count;
while (crng isn't inited) {
  msleep(1);
}

and modify add_timer_randomness() to at least credit a tiny bit to
crng_init_cnt.

Or we do something like intentionally triggering readahead on some
offset on the root block device.  We should definitely not trigger
*blocking* IO.

Also, I wonder if the real problem preventing the RNG from staring up
is that the crng_init_cnt threshold is too high.  We have a rather
baroque accounting system, and it seems like we can accumulate and
credit entropy for a very long time indeed without actually
considering ourselves done.

--Andy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:22                                               ` Andy Lutomirski
@ 2019-09-20 19:37                                                 ` Willy Tarreau
  2019-09-20 19:52                                                   ` Andy Lutomirski
  2019-09-20 20:02                                                 ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Willy Tarreau @ 2019-09-20 19:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
> Perhaps userland could register a helper that takes over and does
> something better?

If userland sees the failure it can do whatever the developer/distro
packager thought suitable for the system facing this condition.

> But I think the kernel really should do something
> vaguely reasonable all by itself.

Definitely, that's what Linus' proposal was doing. Sleeping for some time
is what I call "vaguely reasonable".

> If nothing else, we want the ext4
> patch that provoked this whole discussion to be applied,

Oh absolutely!

> which means
> that we need to unbreak userspace somehow, and returning garbage it to
> is not a good choice.

It depends how it's used. I'd claim that we certainly use randoms for
other things (such as ASLR/hashtables) *before* using them to generate
long lived keys thus we can have a bit more time to get some more
entropy before reaching the point of producing these keys.

> Here are some possible approaches that come to mind:
> 
> int count;
> while (crng isn't inited) {
>   msleep(1);
> }
> 
> and modify add_timer_randomness() to at least credit a tiny bit to
> crng_init_cnt.

Without a timeout it's sure we'll still face some situations where
it blocks forever, which is the current problem.

> Or we do something like intentionally triggering readahead on some
> offset on the root block device.

You don't necessarily have such a device, especially when you're
in an initramfs. It's precisely where userland can be smarter. When
the caller is sfdisk for example, it does have more chances to try
to perform I/O than when it's a tiny http server starting to present
a configuration page.

> We should definitely not trigger *blocking* IO.

I think I agree.

> Also, I wonder if the real problem preventing the RNG from staring up
> is that the crng_init_cnt threshold is too high.  We have a rather
> baroque accounting system, and it seems like we can accumulate and
> credit entropy for a very long time indeed without actually
> considering ourselves done.

I have no opinion on this, lacking the skills to evaluate the situation.
What I can say for sure is that I've faced the non-booting issue quite a
number of times on headless systems, and conversely in the 2.4 era, my
front reverse-proxy by then had the same SSH key as 89 other machines on
the net. So there's surely a sweet spot to find between those two extremes.
I tend to think that waiting *a little bit* for the *first* random is
acceptable, even 10-15s, by the time the user starts to think about
pressing the reset button the system might finish to boot. Hashing some
RAM locations and the RTC when present can also help a little bit. If
at least my machine by then had combined the RTC's date and time with
the hash, chances for a key collision would have gone down to one over
many thousands.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:12                                               ` Andy Lutomirski
@ 2019-09-20 19:51                                                 ` Linus Torvalds
  2019-09-20 20:11                                                   ` Alexander E. Patrakov
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-20 19:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Willy Tarreau, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 12:12 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> The problem is that new programs will have to try the new flag value
> and, if it returns -EINVAL, fall back to 0.  This isn't so great.

Don't be silly.

Of course they will do that, but so what? With a new kernel, they'll
get the behavior they expect. And with an old kernel, they'll get the
behavior they expect.

They'd never fall back to to "0 means something I didn't want",
exactly because we'd make this new flag be the first change.

> Wait, are you suggesting that 0 means invoke jitter-entropy or
> whatever and GRND_SECURE_BLOCKING means not wait forever and deadlock?
>  That's no good -- people will want to continue using 0 because the
> behavior is better.

I assume that "not wait forever" was meant to be "wait forever".

So the one thing we have to do is break the "0 waits forever".  I
guarantee that will happen. I will override Ted if he just NAk's it,
because we simply _cannot_ continue with it.

So we absolutely _will_ come up with some way 0 ends the wait. Whether
it's _just_ a timeout, or whether it's jitter-entropy or whatever, it
will happen.

But we'll also make getrandom(0) do the annoying warning, because it's
just ambiguous. And I suspect you'll find that a lot of security
people don't really like jitter-entropy, at least not in whatever
cut-down format we'll likely have to use in the kernel.

And we'll also have to make getrandom(0) be really _timely_. Security
people would likely rather wait for minutes before they are happy with
it. But because it's a boot constraint as things are now, it will not
just be jitter-entropy, it will be _accelerated_ jitter-entropy in 15
seconds or whatever, and since it can't use up all of CPU time, it's
realistically more like "15 second timeout, but less of actual CPU
time for jitter".

We can try to be clever with a background thread and a lot of
yielding(), so that if the CPU is actually idle we'll get most of that
15 seconds for whatever jitter, but end result is that it's still
accelerated.

Do I believe we can do a good job in that kind of timeframe?
Absolutely. The whole point should be that it's still "good enough",
and as has been pointed out, that same jitter entropy that people are
worried about is just done in user space right now instead.

But do I believe that security people would prefer a non-accelerated
GRND_SECURE_BLOCKING? Yes I do. That doesn't mean that
GRND_SECURE_BLOCKING shouldn't use jitter entropy too, but it doesn't
need the same kind of "let's hurry this up because it might be during
early boot and block things".

That said, if we can all convince everybody (hah!) that jitter entropy
in the kernel would be sufficient, then we can make the whole point
entirely moot, and just say "we'll just change crng_wait() to do
jitter entropy instead and be done with it. Then any getrandom() user
will just basically wait for a (very limited) time and the system will
be happy.

If that is the case we wouldn't need new flags at all. But I don't
think you can make everybody agree to that, which is why I suspect
we'll need the new flag, and I'll just take the heat for saying "0 is
now off limits, because it does this thing that a lot of people
dislike".

> IMO this is confusing.  The GRND_RANDOM flag was IMO a mistake and
> should just be retired.  Let's enumerate useful cases and then give
> them sane values.


That's basically what I'm doing. I enumerate the new values.

But the enumerations have hidden meaning, because the actual bits do
matter. The GRND_EXPLICIT bit isn't supposed to be used by any user,
but it has the value it has because it makes old kernels return
-EINVAL.

But if people hate the bit names, we can just do an enum and be done with it:

   enum grnd_flags {
      GRND_NONBLOCK = 1,
      GRND_RANDOM, // Don't use!
      GRND_RANDOM_NONBLOCK, // Don't use
      GRND_UNUSED,
      GRND_INSECURE,
      GRND_SECURE_BLOCKING,
      GRND_SECURE_NONBLOCKING,
  };

but the values now have a _hidden_ pattern (because we currently have
that "| GRND_NONBLOCK" pattern that I want to make sure still
continues to work, rather than give unexpected behavior in case
somebody continues to use it).

So the _only_ difference between the above and what I suggested is
that I made the bit pattern explicit rather than hidden in the value.

> And the only real question is how to map existing users to these
> semantics.  I see two sensible choices:
>
> 1. 0 means "secure, blocking". I think this is not what we'd do if we
> could go back in time and chage the ABI from day 1, but I think it's
> actually good enough.  As long as this mode won't deadlock, it's not
> *that* bad if programs are using it when they wanted "insecure".

It's exactly that "as long as it won't deadlock" that is our current problem.

It *does* deadlock.

So it can't mean "blocking" in any long-term meaning.

It can mean "blocks for up to 15 seconds" or something like that. I'd
honestly prefer a smaller number, but I think 15 seconds is an
acceptable "your user space is buggy, but we won't make you think the
machine hung".

> 2. 0 means "secure, blocking, but warn".  Some new value means
> "secure, blocking, don't warn".  The problem is that new applications
> will have to fall back to 0 to continue supporting old kernels.

The same comment about blocking.

Maybe you came in in the middle, and didn't see the whole "reduced IO
patterns means that boot blocks forever" part of the original problem.

THAT is why 0 will absolutely change behaviour.

                Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:37                                                 ` Willy Tarreau
@ 2019-09-20 19:52                                                   ` Andy Lutomirski
  0 siblings, 0 replies; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 19:52 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andy Lutomirski, Linus Torvalds, Ahmed S. Darwish,
	Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man



> On Sep 20, 2019, at 12:37 PM, Willy Tarreau <w@1wt.eu> wrote:
> 
> On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
>> Perhaps userland could register a helper that takes over and does
>> something better?
> 
> If userland sees the failure it can do whatever the developer/distro
> packager thought suitable for the system facing this condition.
> 
>> But I think the kernel really should do something
>> vaguely reasonable all by itself.
> 
> Definitely, that's what Linus' proposal was doing. Sleeping for some time
> is what I call "vaguely reasonable".

I don’t buy it. We have existing programs that can deadlock on boot. Just throwing -EAGAIN at them in a syscall that didn’t previously block does not strike me as reasonable.

> 
>> If nothing else, we want the ext4
>> patch that provoked this whole discussion to be applied,
> 
> Oh absolutely!
> 
>> which means
>> that we need to unbreak userspace somehow, and returning garbage it to
>> is not a good choice.
> 
> It depends how it's used. I'd claim that we certainly use randoms for
> other things (such as ASLR/hashtables) *before* using them to generate
> long lived keys thus we can have a bit more time to get some more
> entropy before reaching the point of producing these keys.

The problem is that we don’t know what userspace is doing with the output from getrandom(..., 0), so I think we have to be conservative. New kernels need to work with old user code. It’s okay if they’re slower to boot than they could be.

> 
>> Here are some possible approaches that come to mind:
>> 
>> int count;
>> while (crng isn't inited) {
>>  msleep(1);
>> }
>> 
>> and modify add_timer_randomness() to at least credit a tiny bit to
>> crng_init_cnt.
> 
> Without a timeout it's sure we'll still face some situations where
> it blocks forever, which is the current problem.

The point is that we keep the timer running by looping like this, which should cause add_timer_randomness() to get called continuously, which should prevent the deadlock.  I assume the deadlock is because we go into nohz-idle and we sit there with nothing happening at all.

> 
>> Or we do something like intentionally triggering readahead on some
>> offset on the root block device.
> 
> You don't necessarily have such a device, especially when you're
> in an initramfs. It's precisely where userland can be smarter. When
> the caller is sfdisk for example, it does have more chances to try
> to perform I/O than when it's a tiny http server starting to present
> a configuration page.

What I mean is: allow user code to register a usermode helper that helps get entropy. Or just convince distros to bundle some useful daemon that starts at early boot and lives in the initramfs.


^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:22                                               ` Andy Lutomirski
  2019-09-20 19:37                                                 ` Willy Tarreau
@ 2019-09-20 20:02                                                 ` Linus Torvalds
  1 sibling, 0 replies; 211+ messages in thread
From: Linus Torvalds @ 2019-09-20 20:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Willy Tarreau, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 12:22 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> Here are some possible approaches that come to mind:
>
> int count;
> while (crng isn't inited) {
>   msleep(1);
> }
>
> and modify add_timer_randomness() to at least credit a tiny bit to
> crng_init_cnt.

I'd love that, but we don't actually call add_timer_randomness() for timers.

Yeah, the name is misleading.

What the "timer" in add_timer_randomness() means is that we look at
the timing between calls. And we may actually have (long ago) called
it for timer interrupts. But we don't any more.

The only actual users of add_timer_randomness() is
add_input_randomness() and add_disk_randomness(). And it turns out
that even disk IO doesn't really call add_disk_randomness(), so the
only _real_ user is that keyboard input thing.

Which means that unless you sit at the machine and type things in,
add_timer_randomness() _never_ gets called.

No, the real source of entropy right now is
add_interrupt_randomness(), which is called for all device interrupts.

But note the "device interrupts" part. Not the timer interrupt. That's
special, and has its own low-level architecture rules. So only the
normal IO interrupts (like disk/network/etc).

So timers right now do not add _anything_ to the randomness pool. Not
noise, not entropy.

But yes, what you can do is a jitter entropy thing, which basically
does what you suggest, except instead of "msleep(1)" it does something
like

   while (crng isn't inited) {
       sched_yield();
       do_a_round_of_memory_accesses_etc();
       add_cycle_counter_entropy();
   }

and with a lot of handwaving you'll convince a certain amount of
people that yes, the timing of the above is unpredictable enough that
the entropy you add is real.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:51                                                 ` Linus Torvalds
@ 2019-09-20 20:11                                                   ` Alexander E. Patrakov
  2019-09-20 20:17                                                   ` Matthew Garrett
  2019-09-20 20:51                                                   ` Andy Lutomirski
  2 siblings, 0 replies; 211+ messages in thread
From: Alexander E. Patrakov @ 2019-09-20 20:11 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Michael Kerrisk, Willy Tarreau,
	Matthew Garrett, lkml, Ext4 Developers List, Linux API,
	linux-man


[-- Attachment #1: Type: text/plain, Size: 1459 bytes --]

21.09.2019 00:51, Linus Torvalds пишет:

> And we'll also have to make getrandom(0) be really _timely_. Security
> people would likely rather wait for minutes before they are happy with
> it. But because it's a boot constraint as things are now, it will not
> just be jitter-entropy, it will be _accelerated_ jitter-entropy in 15
> seconds or whatever, and since it can't use up all of CPU time, it's
> realistically more like "15 second timeout, but less of actual CPU
> time for jitter".

I don't think that "accelerated jitter" makes sense. The jitterentropy 
hwrng that I sent earlier fills the entropy buffer in less than 2 
seconds, even with quality=4, so there is no need to accelerate it even 
more.

> That said, if we can all convince everybody (hah!) that jitter entropy
> in the kernel would be sufficient, then we can make the whole point
> entirely moot, and just say "we'll just change crng_wait() to do
> jitter entropy instead and be done with it. Then any getrandom() user
> will just basically wait for a (very limited) time and the system will
> be happy.
> 
> If that is the case we wouldn't need new flags at all. But I don't
> think you can make everybody agree to that, which is why I suspect
> we'll need the new flag, and I'll just take the heat for saying "0 is
> now off limits, because it does this thing that a lot of people
> dislike".

I 100% agree with that.

-- 
Alexander E. Patrakov


[-- Attachment #2: Криптографическая подпись S/MIME --]
[-- Type: application/pkcs7-signature, Size: 4052 bytes --]

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:51                                                 ` Linus Torvalds
  2019-09-20 20:11                                                   ` Alexander E. Patrakov
@ 2019-09-20 20:17                                                   ` Matthew Garrett
  2019-09-20 20:51                                                   ` Andy Lutomirski
  2 siblings, 0 replies; 211+ messages in thread
From: Matthew Garrett @ 2019-09-20 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 12:51:12PM -0700, Linus Torvalds wrote:

> So we absolutely _will_ come up with some way 0 ends the wait. Whether
> it's _just_ a timeout, or whether it's jitter-entropy or whatever, it
> will happen.

FWIW, Zircon uses the jitter entropy generator to seed the CRNG and 
documented their findings in 
https://fuchsia.dev/fuchsia-src/zircon/jitterentropy/config-basic .
-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 19:51                                                 ` Linus Torvalds
  2019-09-20 20:11                                                   ` Alexander E. Patrakov
  2019-09-20 20:17                                                   ` Matthew Garrett
@ 2019-09-20 20:51                                                   ` Andy Lutomirski
  2019-09-20 22:44                                                     ` Linus Torvalds
  2 siblings, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

On Fri, Sep 20, 2019 at 12:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> > And the only real question is how to map existing users to these
> > semantics.  I see two sensible choices:
> >
> > 1. 0 means "secure, blocking". I think this is not what we'd do if we
> > could go back in time and chage the ABI from day 1, but I think it's
> > actually good enough.  As long as this mode won't deadlock, it's not
> > *that* bad if programs are using it when they wanted "insecure".
>
> It's exactly that "as long as it won't deadlock" that is our current problem.
>
> It *does* deadlock.
>
> So it can't mean "blocking" in any long-term meaning.
>
> It can mean "blocks for up to 15 seconds" or something like that. I'd
> honestly prefer a smaller number, but I think 15 seconds is an
> acceptable "your user space is buggy, but we won't make you think the
> machine hung".

To be clear, when I say "blocking", I mean "blocks until we're ready,
but we make sure we're ready in a moderately timely manner".

Rather than answering everything point by point, here's a updated
mini-proposal and some thoughts.  There are two families of security
people that I think we care about.  One is the FIPS or CC or PCI
crowd, and they might, quite reasonably, demand actual hardware RNGs.
We should make the hwrng API stop sucking and they should be happy.
(This means expose an hwrng device node per physical device, IMO.)
The other is the one who wants getrandom(), etc to be convincingly
secure and is willing to do some actual analysis.  And I think we can
make them quite happy like this:

In the kernel, we have two types of requests for random numbers: a
request for "secure" bytes and a request for "insecure" bytes.
Requests for "secure" bytes can block or return -EAGAIN.  Requests for
"insecure" bytes succeed without waiting.  In addition, we have a
jitter entropy mechanism (maybe the one mjg59 referenced, maybe
Alexander's -- doesn't really matter) and we *guarantee* that jitter
entropy, by itself, is enough to get the "secure" generator working
after, say, 5s of effort.  By this, I mean that, on an idle system, it
finishes in 5s and, on a fully loaded system, it's allowed to take a
little while longer but not too much longer.

In other words, I want GRND_SECURE_BLOCKING and /dev/random reads to
genuinely always work and to genuinely never take much longer than 5s.
I don't want a special case where they fail.

The exposed user APIs are, subject to bikeshedding that can happen
later over the actual values, etc:

GRND_SECURE_BLOCKING: returns "secure" output and blocks until it's
ready.  This never fails, but it also never blocks forever.

GRND_SECURE_NONBLOCKING: same but returns -EAGAIN instead of blocking.

GRND_INSECURE: returns "insecure" output immediately.  I think we do
need this -- the "secure" mode may take a little while at early boot,
and libraries that initialize themselves with some randomness really
do want a way to get some numbers without any delay whatsoever.

0: either the same as GRND_SECURE_BLOCKING plus a warning or the
"accelerated" version.  The "accelerated" version means wait up to 2s
for secure numbers and, if there still aren't any, fall back to
"insecure".

GRND_RANDOM: either the same as 0 or the same as GRND_SECURE_BLOCKING
but with a warning.  I don't particularly care either way.

I'm okay with a well-defined semantic like I proposed for an
accelerated mode.  I don't really want to try to define what a
secure-but-not-as-secure mode means as a separate complication that
the underlying RNG needs to support forever.  I don't think the
security folks would like that either.

How does this sound?

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 20:51                                                   ` Andy Lutomirski
@ 2019-09-20 22:44                                                     ` Linus Torvalds
  2019-09-20 23:30                                                       ` Andy Lutomirski
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-20 22:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ahmed S. Darwish, Lennart Poettering, Theodore Y. Ts'o,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk,
	Willy Tarreau, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 1:51 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> To be clear, when I say "blocking", I mean "blocks until we're ready,
> but we make sure we're ready in a moderately timely manner".

.. an I want a pony.

The problem is that you start from an assumption that we simply can't
seem to do.

> In other words, I want GRND_SECURE_BLOCKING and /dev/random reads to
> genuinely always work and to genuinely never take much longer than 5s.
> I don't want a special case where they fail.

Honestly, if that's the case and we _had_ such a methoc of
initializing the rng, then I suspect we could just ignore the flags
entirely, with the possible exception of GRND_NONBLOCK. And even that
is "possible exception", because once your worst-case is a one-time
delay of 5s at boot time thing, you might as well consider it
nonblocking in general.

Yes, there are some in-kernel users that really can't afford to do
even that 5s delay (not just may they be atomic, but more likely it's
just that we don't want to delay _everything_ by 5s), but they don't
use the getrandom() system call anyway.

> The exposed user APIs are, subject to bikeshedding that can happen
> later over the actual values, etc:

So the thing is, you start from the impossible assumption, and _if_
you hold that assumption then we might as well just keep the existing
"zero means blocking", because nobody mind.

I'd love to say "yes, we can guarantee good enough entropy for
everybody in 5s and we don't even need to warn about it, because
everybody will be comfortable with the state of our entropy at that
point".

It sounds like a _lovely_ model.

But honestly, it simply sounds unlikely.

Now, there are different kinds of unlikely.

In particular, if you actually have a CPU cycle counter that actually
runs at least on the same order of magnitude as the CPU frequency -
then I believe in the jitter entropy more than in many other cases.

Sadly, many platforms don't have that kind of cycle counter.

I've also not seen a hugely believable "yes, the jitter entropy is
real" paper. Alexander points to the existing jitterentropy crypto
code, and claims it can fill all our entropy needs in two seconds, but
there are big caveats:

 (a) that code uses get_random_entropy(), which on a PC is that nice
fast TSC that we want. On other platforms (or on really old PC's - we
technically support CPU's still that don't have rdtsc)? It might be
zero. Every time.

 (b) How was it tested? There are lots of randomness tests, but most
of them can be fooled with a simple counter through a cryptographic
hash - which you basically need to do anyway on whatever entropy
source you have in order to "whiten" it. It's simply _really_ hard to
decide on entropy.

So it's really easy to make the randomness of some input look really
good, without any real idea how good it truly is. And maybe it really
is very very good on one particular machine, and then on another one
(with either a simpler in-order core or a lower-frequency timestamp
counter) it might be horrendously bad, and you'll never know,

So I'd love to believe in your simple model. Really. I just don't see
how to get there reliably.

Matthew Garrettpointed to one analysis on jitterentropy, and that one
wasn't all that optimistic.

I do think jitterentropy would likely be good enough in practice - at
least on PC's with a TSC - for the fairly small window at boot and
getrandom(0). As I mentioned, I don't think it will make anybody
_happy_, but it might be one of those things where it's a compromise
that at least works for people, with the key generation people who are
really unhappy with it having a new option for their case.

And maybe Alexander can convince people that when you run the
jitterentropy code a hundred billion times, the end result (not the
random stream from it, but the jitter bits themselves - but I'm not
even sure how to boil it down) - really is random.

             Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 22:44                                                     ` Linus Torvalds
@ 2019-09-20 23:30                                                       ` Andy Lutomirski
  2019-09-21  3:05                                                         ` Willy Tarreau
  0 siblings, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-20 23:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

On Fri, Sep 20, 2019 at 3:44 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Sep 20, 2019 at 1:51 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > To be clear, when I say "blocking", I mean "blocks until we're ready,
> > but we make sure we're ready in a moderately timely manner".
>
> .. an I want a pony.
>
> The problem is that you start from an assumption that we simply can't
> seem to do.

Eh, fair enough, I wasn't thinking about platforms without fast clocks.

I'm very nervous about allowing getrandom(..., 0) to fail with
-EAGAIN, though.  On a very, very brief search, I didn't find any
programs that would incorrectly assume it worked, but I can easily
imagine programs crashing, and that might be bad, too.  At the end of
the day, most user programmers who call getrandom() really did notice
that we flubbed the ABI, and either they were too lazy to fall back to
/dev/urandom, or they didn't want to for some reason, or they
genuinely want the blocking behavior.  And people who work with little
embedded systems without good clocks that basically can't generate
random numbers already know this, and they have little scripts to help
out.

So I think that just improving the
getrandom()-is-blocking-on-x86-and-arm behavior, adding GRND_INSECURE
and GRND_SECURE_BLOCKING, and adding the warning if 0 is passed is
good enough.  I suppose we could also have separate
GRND_SECURE_BLOCKING and GRND_SECURE_BLOCK_FOREVER.  We could also say
that, if you want to block forever, you should poll() on /dev/random
(with my patches applied, where this actually does what users would
want).

--Andy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 23:30                                                       ` Andy Lutomirski
@ 2019-09-21  3:05                                                         ` Willy Tarreau
  0 siblings, 0 replies; 211+ messages in thread
From: Willy Tarreau @ 2019-09-21  3:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Matthew Garrett, lkml, Ext4 Developers List,
	Linux API, linux-man

On Fri, Sep 20, 2019 at 04:30:20PM -0700, Andy Lutomirski wrote:
> So I think that just improving the
> getrandom()-is-blocking-on-x86-and-arm behavior, adding GRND_INSECURE
> and GRND_SECURE_BLOCKING, and adding the warning if 0 is passed is
> good enough.

I think so as well. Anyway, keep in mind that *with a sane API*,
userland can improve very quickly (faster than kernel deployments in
field). But userland developers need reliable and testable support for
features. If it's enough to do #ifndef GRND_xxx/#define GRND_xxx and
call getrandom() with these flags to detect support, it's basically 5
reliable lines of code to add to userland to make a warning disappear
and/or to allow a system that previously failed to boot to now boot. So
this gives strong incentive to userland to adopt the new API, provided
there's a way for the developer to understand what's happening (which
the warning does).

If we do it right, all we'll hear are userland developers complaining
that those stupid kernel developers have changed their API again and
really don't know what they want. That will be a good sign that the
warning flows back to them and that adoption is taking.

And if the change is small enough, maybe it could make sense to backport
it to stable versions to fix boot issues. With a testable feature it
does make sense.

Willy

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-20 18:09                                             ` Linus Torvalds
  2019-09-20 18:16                                               ` Willy Tarreau
  2019-09-20 19:12                                               ` Andy Lutomirski
@ 2019-09-21  6:07                                               ` Florian Weimer
  2019-09-23 18:33                                                 ` Andy Lutomirski
  2 siblings, 1 reply; 211+ messages in thread
From: Florian Weimer @ 2019-09-21  6:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ahmed S. Darwish, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

* Linus Torvalds:

> Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> aiming for.
>
> However, it's worth noting that nobody should ever use GRND_EXPLICIT
> directly. That's just the name for the bit. The actual users would use
> GRND_INSECURE or GRND_SECURE.

Should we switch glibc's getentropy to GRND_EXPLICIT?  Or something
else?

I don't think we want to print a kernel warning for this function.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 211+ messages in thread

* RE: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-19 20:04                                         ` Linus Torvalds
  2019-09-19 20:45                                           ` Alexander E. Patrakov
@ 2019-09-23 11:55                                           ` David Laight
  1 sibling, 0 replies; 211+ messages in thread
From: David Laight @ 2019-09-23 11:55 UTC (permalink / raw)
  To: 'Linus Torvalds', Theodore Y. Ts'o
  Cc: Ahmed S. Darwish, Lennart Poettering, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, lkml, linux-ext4,
	linux-man

From: Linus Torvalds
> Sent: 19 September 2019 21:04
...
> Note small detail above: I changed the ^= to a +=. Addition tends to
> be better (due to carry between bits) when there might be bit
> commonalities.  Particularly with something like a cycle count where
> two xors can mostly cancel out previous bits rather than move bits
> around in the word.

There is code in one on the kernel RNG that XORs together the output
of 3 LFSR (CRC) generators.
I think it is used for 'low quality' randomness and reseeded from the main RNG.
Using XOR makes the entire generator 'linear' and thus trivially reversible.
With a relatively small number of consecutive outputs you can determine the state
of all 3 LFSR.
Merge the results with addition and the process is immensely harder.

I've also wondered whether the RC4 generator is a useful entropy store?
It has a lot of state and you can fairly easily feed in values that might (or
might not) contain any randomness without losing any stored entropy.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-21  6:07                                               ` Florian Weimer
@ 2019-09-23 18:33                                                 ` Andy Lutomirski
  2019-09-26 21:11                                                   ` Ahmed S. Darwish
  0 siblings, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-23 18:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Linus Torvalds, Andy Lutomirski, Ahmed S. Darwish,
	Lennart Poettering, Theodore Y. Ts'o, Eric W. Biederman,
	Alexander E. Patrakov, Michael Kerrisk, Willy Tarreau,
	Matthew Garrett, lkml, Ext4 Developers List, Linux API,
	linux-man

On Fri, Sep 20, 2019 at 11:07 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Linus Torvalds:
>
> > Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> > aiming for.
> >
> > However, it's worth noting that nobody should ever use GRND_EXPLICIT
> > directly. That's just the name for the bit. The actual users would use
> > GRND_INSECURE or GRND_SECURE.
>
> Should we switch glibc's getentropy to GRND_EXPLICIT?  Or something
> else?
>
> I don't think we want to print a kernel warning for this function.
>

Contemplating this question, I think the answer is that we should just
not introduce GRND_EXPLICIT or anything like it.  glibc is going to
have to do *something*, and getentropy() is unlikely to just go away.
The explicitly documented semantics are that it blocks if the RNG
isn't seeded.

Similarly, FreeBSD has getrandom():

https://www.freebsd.org/cgi/man.cgi?query=getrandom&sektion=2&manpath=freebsd-release-ports

and if we make getrandom(..., 0) warn, then we have a situation where
the *correct* (if regrettable) way to use the function on FreeBSD
causes a warning on Linux.

Let's just add GRND_INSECURE, make the blocking mode work better, and,
if we're feeling a bit more adventurous, add GRND_SECURE_BLOCKING as a
better replacement for 0, convince FreeBSD to add it too, and then
worry about deprecating 0 once we at least get some agreement from the
FreeBSD camp.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* chaos generating driver was Re: Linux 5.3-rc8
  2019-09-14 16:30                         ` Linus Torvalds
                                             ` (2 preceding siblings ...)
  2019-09-15  6:51                           ` Lennart Poettering
@ 2019-09-23 20:49                           ` Pavel Machek
  3 siblings, 0 replies; 211+ messages in thread
From: Pavel Machek @ 2019-09-23 20:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ahmed S. Darwish, Theodore Y. Ts'o, Andreas Dilger, Jan Kara,
	Ray Strode, William Jon McCann, Alexander E. Patrakov, zhangjs,
	linux-ext4, Lennart Poettering, lkml

Hi!

> >     => src/random-seed/random-seed.c:
> >     /*
> >      * Let's make this whole job asynchronous, i.e. let's make
> >      * ourselves a barrier for proper initialization of the
> >      * random pool.
> >      */
...
> >      k = getrandom(buf, buf_size, GRND_NONBLOCK);
> >      if (k < 0 && errno == EAGAIN && synchronous) {
> >          log_notice("Kernel entropy pool is not initialized yet, "
> >                     "waiting until it is.");
> >
> >          k = getrandom(buf, buf_size, 0); /* retry synchronously */
> >      }
> 
> Yeah, the above is yet another example of completely broken garbage.
> 
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.

Hmm. This actually points to a solution, and I believe solution is in the
kernel. Userspace is not the best place to decide what is the best way to
generate entropy.

> As mentioned, this has already historically been a huge issue on
> embedded devices, and with disks turnign not just to NVMe but to
> actual polling nvdimm/xpoint/flash, the amount of true "entropy"
> randomness we can give at boot is very questionable.
> 
> We can (and will) continue to do a best-effort thing (including very
> much using rdread and friends), but the whole "wait for entropy"
> simply *must* stop.

And we can stop it... from kernel, and without hacks. Simply by generating some
entropy. We do not need to sit quietly while userspace waits for entropy to appear.

We can for example do some reads from the disk. (find / should be good for generating
entropy on many systems). For systems with rtc but not timestamp counter, we can
actually just increase register, then read it from interrupt...
...to get precise timings. We know system is blocked waiting for entropy, we can
do expensive things we would not "normally" do.

Yes, it would probably mean new kind of "driver" whose purpose is to generate some
kind of activity so that interrupts happen and entropy is generated... But that is
still better solution than fixing all of the userspace.

(With some proposals here, userspace _could_ do 

while (getrandom() == -EINVAL) {
    system("find / &");
    sleep(1);
}

...but I believe we really want to do it once, in kernel, and less hacky than this)

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH v5 0/1] random: getrandom(2): warn on large CRNG waits, introduce new flags
  2019-09-18 23:57                                   ` Linus Torvalds
  2019-09-19 14:34                                     ` Theodore Y. Ts'o
  2019-09-20 13:46                                     ` Ahmed S. Darwish
@ 2019-09-26 20:42                                     ` Ahmed S. Darwish
  2019-09-26 20:44                                       ` [PATCH v5 1/1] " Ahmed S. Darwish
  2 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-26 20:42 UTC (permalink / raw)
  To: Linus Torvalds, Theodore Y. Ts'o
  Cc: Florian Weimer, Willy Tarreau, Matthew Garrett, Andy Lutomirski,
	Lennart Poettering, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, lkml, linux-ext4, linux-api, linux-man

Summary / Changelog-v5:

  - Add the new flags GRND_INSECURE and GRND_SECURE_UNBOUNDED_INITIAL_WAIT
    to getrandom(2), instead of introducing a new getrandom2(2) system
    call, which nobody liked.

  - Fix a bug discovered through testing where "int ret =
    wait_event_interruptible_timeout(waitq, true, MAX_SCHEDULE_TIMEOUT)"
    returns failure (-1) due to implicit LONG_MAX => int truncation

  - WARN if a process is stuck on getrandom(,,flags=0) for more than 30
    seconds ... defconfig and bootparam configurable

  - Add documentation for "random.getrandom_wait_threshold" kernel param

  - Extra comments @ include/uapi/linux/random.h and random.c::getrandom.
    Explicit recommendations to *exclusively* use the new flags.

  - GRND_INSECURE never issue any warning, even if CRNG is not inited.
    Similarly for GRND_SECURE_UNBOUNDED_INITIAL_WAIT, no matter how
    big the unbounded wait is.

In a reply to the V4 patch, Linus posted a related patch [*] with the
following additions:

  - Drop the original random.c behavior of having each /dev/urandom
    "CRNG not inited" warning also _reset_ the crng_init_cnt entropy.

    This is not included in this patch, as IMHO this can be done as a
    separate patch on top.

 - Limit GRND_RANDOM max count/buflen to 32MB instead of 2GB.  This
   is very sane obviously, and can be done in a separate patch on
   top.

   This V5 patch just tries to be as conservative as possible.

 - GRND_WAIT_ENTROPY and GRND_EXCPLICIT: AFAIK these were primarily
   added so that getrandom(,,flags=0) can be changed to return
   weaker non-blocking crypto from non-inited CRG in a possible
   future.

   I hope we don't have to resort to that extreme measure.. Hopefully
   the WARN() on this patch will be enough in nudging distributions to
   enable more hwrng sources (RDRAND, etc.) .. and also for the
   user-space developres badly pointed at (hi GDM and Qt) to fix their
   code.

[*] https://lkml.kernel.org/r/CAHk-=wiCqDiU7SE3FLn2W26MS_voUAuqj5XFa1V_tiGTrrW-zQ@mail.gmail.com

Ahmed S. Darwish (1):
  random: getrandom(2): warn on large CRNG waits, introduce new flags

 .../admin-guide/kernel-parameters.txt         |   7 ++
 drivers/char/Kconfig                          |  60 ++++++++++-
 drivers/char/random.c                         | 102 +++++++++++++++---
 include/uapi/linux/random.h                   |  27 ++++-
 4 files changed, 177 insertions(+), 19 deletions(-)

--
2.23.0

^ permalink raw reply	[flat|nested] 211+ messages in thread

* [PATCH v5 1/1] random: getrandom(2): warn on large CRNG waits, introduce new flags
  2019-09-26 20:42                                     ` [PATCH v5 0/1] random: getrandom(2): warn on large CRNG waits, introduce new flags Ahmed S. Darwish
@ 2019-09-26 20:44                                       ` Ahmed S. Darwish
  2019-09-26 21:39                                         ` Andy Lutomirski
  0 siblings, 1 reply; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-26 20:44 UTC (permalink / raw)
  To: Linus Torvalds, Theodore Y. Ts'o
  Cc: Florian Weimer, Willy Tarreau, Matthew Garrett, Andy Lutomirski,
	Lennart Poettering, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, lkml, linux-ext4, linux-api, linux-man

Since Linux v3.17, getrandom(2) has been created as a new and more
secure interface for pseudorandom data requests.  It attempted to
solve three problems, as compared to /dev/urandom:

  1. the need to access filesystem paths, which can fail, e.g. under a
     chroot

  2. the need to open a file descriptor, which can fail under file
     descriptor exhaustion attacks

  3. the possibility of getting not-so-random data from /dev/urandom,
     due to an incompletely initialized kernel entropy pool

To solve the third point, getrandom(2) was made to block until a
proper amount of entropy has been accumulated to initialize the CRNG
ChaCha20 cipher.  This made the system call have no guaranteed
upper-bound for its initial waiting time.

Thus when it was introduced at c6e9d6f38894 ("random: introduce
getrandom(2) system call"), it came with a clear warning: "Any
userspace program which uses this new functionality must take care to
assure that if it is used during the boot process, that it will not
cause the init scripts or other portions of the system startup to hang
indefinitely."

Unfortunately, due to multiple factors, including not having this
warning written in a scary-enough language in the manpages, and due to
glibc since v2.25 implementing a BSD-like getentropy(3) in terms of
getrandom(2), modern user-space is calling getrandom(2) in the boot
path everywhere (e.g. Qt, GDM, etc.)

Embedded Linux systems were first hit by this, and reports of embedded
systems "getting stuck at boot" began to be common.  Over time, the
issue began to even creep into consumer-level x86 laptops: mainstream
distributions, like Debian Buster, began to recommend installing
haveged as a duct-tape workaround... just to let the system boot.

Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33
("ext4: make __ext4_get_inode_loc plug"), which merged directory
lookup code inode table IO, and very fast systemd boots, further
exaggerated the problem by limiting interrupt-based entropy sources.
This led to large delays until the kernel's cryptographic random
number generator (CRNG) got initialized.

On a Thinkpad E480 x86 laptop and an ArchLinux user-space, the ext4
commit earlier mentioned reliably blocked the system on GDM boot.
Mitigate the problem, as a first step, in two ways:

  1. Issue a big WARN_ON when any process gets stuck on getrandom(2)
     for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds.

  2. Introduce new getrandom(2) flags, with clear semantics that can
     hopefully guide user-space in doing the right thing.

Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second
default value. System integrators and distribution builders are deeply
encouraged not to increase it much: during system boot, you either
have entropy, or you don't. And if you didn't have entropy, it will
stay like this forever, because if you had, you wouldn't have blocked
in the first place. It's an atomic "either/or" situation, with no
middle ground. Please think twice.

For the new getrandom(2) flags, be much more explicit.  As Linus
mentioned several times in the bug report thread, Linux should've
never provided /dev/random and the getrandom(GRND_RANDOM) APIs. These
interfaces are broken by design due to their almost-permanent
blockage, leading to the current misuse of /dev/urandom and
getrandom(flags=0) calls. Thus introduce the flags:

  1. GRND_INSECURE
  2. GRND_SECURE_UNBOUNDED_INITIAL_WAIT

where both extract randomness _exclusively_ from the urandom source.

Due to the explicit semantics of these new flags, GRND_INSECURE will
never issue a kernel warning message even if the CRNG is not yet
inited.  Similarly, GRND_SECURE_UNBOUNDED_INITIAL_WAIT will never
cause any any kernel WARN, no matter how large the unbounded wait is.

Rreported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Link: https://lkml.kernel.org/r/20190910042107.GA1517@darwi-home-pc
Link: https://lkml.kernel.org/r/20190912034421.GA2085@darwi-home-pc
Link: https://lkml.kernel.org/r/20190914222432.GC19710@mit.edu
Link: https://lkml.kernel.org/r/20180514003034.GI14763@thunk.org
Link: https://lkml.kernel.org/r/CAHk-=wjyH910+JRBdZf_Y9G54c1M=LBF8NKXB6vJcm9XjLnRfg@mail.gmail.com
Link: https://lkml.kernel.org/r/20190917052438.GA26923@1wt.eu
Link: https://lkml.kernel.org/r/20190917160844.GC31567@gardel-login
Link: https://lkml.kernel.org/r/CAHk-=wjABG3+daJFr4w3a+OWuraVcZpi=SMUg=pnZ+7+O0E2FA@mail.gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wjQeiYu8Q_wcMgM-nAcW7KsBfG1+90DaTD5WF2cCeGCgA@mail.gmail.com
Link: https://factorable.net ("Widespread Weak Keys in Network Devices")
Link: https://man.openbsd.org/man4/random.4
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |   7 ++
 drivers/char/Kconfig                          |  60 ++++++++++-
 drivers/char/random.c                         | 102 +++++++++++++++---
 include/uapi/linux/random.h                   |  27 ++++-
 4 files changed, 177 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6ef205fd7c97..d82eafc6a62a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3728,6 +3728,13 @@
 			fully seed the kernel's CRNG. Default is controlled
 			by CONFIG_RANDOM_TRUST_CPU.

+	random.getrandom_wait_threshold=
+			Maximum amount, in seconds, for a process to block
+			in a getrandom(,,flags=0) systemcall without a loud
+			warning in the kernel logs. Default is controlled by
+			CONFIG_RANDOM_GETRANDOM_WAIT_THRESHOLD_SEC. Check
+			the config option help text for more information.
+
 	ras=option[,option,...]	[KNL] RAS-specific options

 		cec_disable	[X86]
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index df0fc997dc3e..adc9bc63d27c 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -535,8 +535,6 @@ config ADI
 	  and SSM (Silicon Secured Memory).  Intended consumers of this
 	  driver include crash and makedumpfile.

-endmenu
-
 config RANDOM_TRUST_CPU
 	bool "Trust the CPU manufacturer to initialize Linux's CRNG"
 	depends on X86 || S390 || PPC
@@ -559,4 +557,60 @@ config RANDOM_TRUST_BOOTLOADER
 	device randomness. Say Y here to assume the entropy provided by the
 	booloader is trustworthy so it will be added to the kernel's entropy
 	pool. Otherwise, say N here so it will be regarded as device input that
-	only mixes the entropy pool.
\ No newline at end of file
+	only mixes the entropy pool.
+
+config RANDOM_GETRANDOM_WAIT_THRESHOLD_SEC
+	int
+	default 30
+	help
+	  The getrandom(2) system call, when asking for entropy from the
+	  urandom source, blocks until the kernel's Cryptographic Random
+	  Number Generator (CRNG) gets initialized. This configuration
+	  option sets the maximum wait time, in seconds, for a process
+	  to get blocked on such a system call before the kernel issues
+	  a loud warning. Rationale follows:
+
+	  When the getrandom(2) system call was created, it came with
+	  the clear warning: "Any userspace program which uses this new
+	  functionality must take care to assure that if it is used
+	  during the boot process, that it will not cause the init
+	  scripts or other portions of the system startup to hang
+	  indefinitely.
+
+	  Unfortunately, due to multiple factors, including not having
+	  this warning written in a scary-enough language in the
+	  manpages, and due to glibc since v2.25 implementing a BSD-like
+	  getentropy(3) in terms of getrandom(2), modern user-space is
+	  calling getrandom(2) in the boot path everywhere.
+
+	  Embedded Linux systems were first hit by this, and reports of
+	  embedded system "getting stuck at boot" began to be
+	  common. Over time, the issue began to even creep into consumer
+	  level x86 laptops: mainstream distributions, like Debian
+	  Buster, began to recommend installing haveged as a workaround,
+	  just to let the system boot.
+
+	  Filesystem optimizations in EXT4 and XFS exaggerated the
+	  problem, due to aggressive batching of IO requests, and thus
+	  minimizing sources of entropy at boot. This led to large
+	  delays until the kernel's CRNG got initialized.
+
+	  System integrators and distribution builders are not
+	  encouraged to considerably increase this value: during system
+	  boot, you either have entropy, or you don't. And if you didn't
+	  have entropy, it will stay like this forever, because if you
+	  had, you wouldn't have blocked in the first place. It's an
+	  atomic "either/or" situation, with no middle ground. Please
+	  think twice.
+
+	  Ideally, systems would be configured with hardware random
+	  number generators, and/or configured to trust the CPU-provided
+	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
+	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
+	  should generate cryptographic keys only as late as possible,
+	  when they are needed, instead of during early boot.  For
+	  non-cryptographic use cases, such as dictionary seeds or MIT
+	  Magic Cookies, the getrandom2(GRND2_INSECURE) system call,
+	  or even random(3), may be more appropriate.
+
+endmenu
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 566922df4b7b..37c00cff1c08 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -322,6 +322,7 @@
 #include <linux/interrupt.h>
 #include <linux/mm.h>
 #include <linux/nodemask.h>
+#include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/kthread.h>
 #include <linux/percpu.h>
@@ -854,12 +855,21 @@ static void invalidate_batched_entropy(void);
 static void numa_crng_init(void);

 static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
+static int getrandom_wait_threshold __ro_after_init =
+				CONFIG_RANDOM_GETRANDOM_WAIT_THRESHOLD_SEC;
+
 static int __init parse_trust_cpu(char *arg)
 {
 	return kstrtobool(arg, &trust_cpu);
 }
 early_param("random.trust_cpu", parse_trust_cpu);

+static int __init parse_getrandom_wait_threshold(char *arg)
+{
+	return kstrtoint(arg, 0, &getrandom_wait_threshold);
+}
+early_param("random.getrandom_wait_threshold", parse_getrandom_wait_threshold);
+
 static void crng_initialize(struct crng_state *crng)
 {
 	int		i;
@@ -1960,7 +1970,7 @@ random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 }

 static ssize_t
-urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+_urandom_read(char __user *buf, size_t nbytes, bool warn_on_noninited_crng)
 {
 	unsigned long flags;
 	static int maxwarn = 10;
@@ -1968,7 +1978,7 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)

 	if (!crng_ready() && maxwarn > 0) {
 		maxwarn--;
-		if (__ratelimit(&urandom_warning))
+		if (warn_on_noninited_crng && __ratelimit(&urandom_warning))
 			printk(KERN_NOTICE "random: %s: uninitialized "
 			       "urandom read (%zd bytes read)\n",
 			       current->comm, nbytes);
@@ -1982,6 +1992,13 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }

+static ssize_t
+urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+{
+	/* warn on non-inited CRNG */
+	return _urandom_read(buf, nbytes, true);
+}
+
 static __poll_t
 random_poll(struct file *file, poll_table * wait)
 {
@@ -2118,13 +2135,55 @@ const struct file_operations urandom_fops = {
 	.llseek = noop_llseek,
 };

+static int geturandom_wait(char __user *buf, size_t count,
+			   bool warn_on_large_wait)
+{
+	long ret, timeout = MAX_SCHEDULE_TIMEOUT;
+
+	if (warn_on_large_wait && (getrandom_wait_threshold > 0))
+		timeout = HZ * getrandom_wait_threshold;
+
+	do {
+		ret = wait_event_interruptible_timeout(crng_init_wait,
+						       crng_ready(),
+						       timeout);
+		if (ret < 0)
+			return ret;
+
+		if (ret == 0) {
+			WARN(1, "random: %s[%d]: getrandom(%zu bytes) "
+			     "is blocked for more than %d seconds. Check "
+			     "getrandom_wait(7)\n", current->comm,
+			     task_pid_nr(current), count,
+			     getrandom_wait_threshold);
+
+			/* warn once per caller */
+			timeout = MAX_SCHEDULE_TIMEOUT;
+		}
+
+	} while (ret == 0);
+
+	return _urandom_read(buf, count, true);
+}
+
 SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 		unsigned int, flags)
 {
-	int ret;
+	unsigned int i, invalid_combs[] = {
+		GRND_INSECURE|GRND_SECURE_UNBOUNDED_INITIAL_WAIT,
+		GRND_INSECURE|GRND_RANDOM,
+	};

-	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
+	if (flags & ~(GRND_NONBLOCK | \
+		      GRND_RANDOM   | \
+		      GRND_INSECURE | \
+		      GRND_SECURE_UNBOUNDED_INITIAL_WAIT)) {
 		return -EINVAL;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(invalid_combs); i++)
+		if ((flags & invalid_combs[i]) == invalid_combs[i])
+			return -EINVAL;

 	if (count > INT_MAX)
 		count = INT_MAX;
@@ -2132,14 +2191,33 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 	if (flags & GRND_RANDOM)
 		return _random_read(flags & GRND_NONBLOCK, buf, count);

-	if (!crng_ready()) {
-		if (flags & GRND_NONBLOCK)
+	/*
+	 * urandom: explicit request *not* to wait for CRNG init, and
+	 * thus no "uninitialized urandom read" warnings.
+	 */
+	if (flags & GRND_INSECURE)
+		return _urandom_read(buf, count, false);
+
+	/* urandom: nonblocking access */
+	if ((flags & GRND_NONBLOCK) && !crng_ready())
 			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
-	}
-	return urandom_read(NULL, buf, count, NULL);
+
+	/*
+	 * urandom: explicit request *to* wait for CRNG init, and thus
+	 * no "getrandom is blocked for more than X seconds" warnings
+	 * on large waits.
+	 */
+	if (flags & GRND_SECURE_UNBOUNDED_INITIAL_WAIT)
+		return geturandom_wait(buf, count, false);
+
+	/*
+	 * urandom: *implicit* request to wait for CRNG init (flags=0)
+	 *
+	 * User-space has been badly abusing this by calling getrandom
+	 * with flags=0 in the boot path, and thus blocking system
+	 * boots forever in absence of entropy. Warn on large waits.
+	 */
+	return geturandom_wait(buf, count, true);
 }

 /********************************************************************
@@ -2458,4 +2536,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size)
 	else
 		add_device_randomness(buf, size);
 }
-EXPORT_SYMBOL_GPL(add_bootloader_randomness);
\ No newline at end of file
+EXPORT_SYMBOL_GPL(add_bootloader_randomness);
diff --git a/include/uapi/linux/random.h b/include/uapi/linux/random.h
index 26ee91300e3e..5a3df92270a7 100644
--- a/include/uapi/linux/random.h
+++ b/include/uapi/linux/random.h
@@ -8,6 +8,7 @@
 #ifndef _UAPI_LINUX_RANDOM_H
 #define _UAPI_LINUX_RANDOM_H

+#include <linux/bits.h>
 #include <linux/types.h>
 #include <linux/ioctl.h>
 #include <linux/irqnr.h>
@@ -23,7 +24,7 @@
 /* Get the contents of the entropy pool.  (Superuser only.) */
 #define RNDGETPOOL	_IOR( 'R', 0x02, int [2] )

-/*
+/*
  * Write bytes into the entropy pool and add to the entropy count.
  * (Superuser only.)
  */
@@ -47,10 +48,28 @@ struct rand_pool_info {
 /*
  * Flags for getrandom(2)
  *
+ * 0			discouraged - don't use (see below)
  * GRND_NONBLOCK	Don't block and return EAGAIN instead
- * GRND_RANDOM		Use the /dev/random pool instead of /dev/urandom
+ * GRND_RANDOM		discouraged - don't use (uses /dev/random pool)
+ * GRND_INSECURE	Use urandom pool, never block even if CRNG isn't inited
+ * GRND_SECURE_UNBOUNDED_INITIAL_WAIT
+ *			Use urandom pool, block until CRNG is inited
+ *
+ * User-space has been badly abusing getrandom(flags=0) by calling
+ * it in the boot path, and thus blocking system boots forever in
+ * the absence of entropy (a blocked system cannot generate more
+ * entropy, by definition).
+ *
+ * Thus if a process blocks on a getrandom(flags=0), waithing for
+ * more than CONFIG_RANDOM_GETRANDOM_WAIT_THRESHOLD_SEC seconds,
+ * the kernel will issue a loud warning.
+ *
+ * In general, don't use flags=0. Always use either GRND_INSECURE
+ * or GRND_SECURE_UNBOUNDED_INITIAL_WAIT instead.
  */
-#define GRND_NONBLOCK	0x0001
-#define GRND_RANDOM	0x0002
+#define GRND_NONBLOCK				BIT(0)
+#define GRND_RANDOM				BIT(1)
+#define GRND_INSECURE				BIT(2)
+#define GRND_SECURE_UNBOUNDED_INITIAL_WAIT	BIT(3)

 #endif /* _UAPI_LINUX_RANDOM_H */
--
2.23.0

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
  2019-09-23 18:33                                                 ` Andy Lutomirski
@ 2019-09-26 21:11                                                   ` Ahmed S. Darwish
  0 siblings, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-26 21:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, Linus Torvalds, Lennart Poettering,
	Theodore Y. Ts'o, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, Willy Tarreau, Matthew Garrett, lkml,
	Ext4 Developers List, Linux API, linux-man

On Mon, Sep 23, 2019 at 11:33:21AM -0700, Andy Lutomirski wrote:
> On Fri, Sep 20, 2019 at 11:07 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Linus Torvalds:
> >
> > > Violently agreed. And that's kind of what the GRND_EXPLICIT is really
> > > aiming for.
> > >
> > > However, it's worth noting that nobody should ever use GRND_EXPLICIT
> > > directly. That's just the name for the bit. The actual users would use
> > > GRND_INSECURE or GRND_SECURE.
> >
> > Should we switch glibc's getentropy to GRND_EXPLICIT?  Or something
> > else?
> >
> > I don't think we want to print a kernel warning for this function.
> >
> 
> Contemplating this question, I think the answer is that we should just
> not introduce GRND_EXPLICIT or anything like it.  glibc is going to
> have to do *something*, and getentropy() is unlikely to just go away.
> The explicitly documented semantics are that it blocks if the RNG
> isn't seeded.
> 
> Similarly, FreeBSD has getrandom():
> 
> https://www.freebsd.org/cgi/man.cgi?query=getrandom&sektion=2&manpath=freebsd-release-ports
> 
> and if we make getrandom(..., 0) warn, then we have a situation where
> the *correct* (if regrettable) way to use the function on FreeBSD
> causes a warning on Linux.
> 
> Let's just add GRND_INSECURE, make the blocking mode work better, and,
> if we're feeling a bit more adventurous, add GRND_SECURE_BLOCKING as a
> better replacement for 0, ...

This is what's now done in the just-submitted V5, except the "make the
blocking mode work better" part:

    https://lkml.kernel.org/r/20190926204217.GA1366@pc

It's a very conservative patch so far IMHO (minus the loud warning).

Thanks,
--
Ahmed Darwish

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH v5 1/1] random: getrandom(2): warn on large CRNG waits, introduce new flags
  2019-09-26 20:44                                       ` [PATCH v5 1/1] " Ahmed S. Darwish
@ 2019-09-26 21:39                                         ` Andy Lutomirski
  2019-09-28  9:30                                           ` Ahmed S. Darwish
  0 siblings, 1 reply; 211+ messages in thread
From: Andy Lutomirski @ 2019-09-26 21:39 UTC (permalink / raw)
  To: Ahmed S. Darwish, Linus Torvalds, Theodore Y. Ts'o
  Cc: Florian Weimer, Willy Tarreau, Matthew Garrett,
	Lennart Poettering, Eric W. Biederman, Alexander E. Patrakov,
	Michael Kerrisk, lkml, linux-ext4, linux-api, linux-man

On 9/26/19 1:44 PM, Ahmed S. Darwish wrote:
> Since Linux v3.17, getrandom(2) has been created as a new and more
> secure interface for pseudorandom data requests.  It attempted to
> solve three problems, as compared to /dev/urandom:
> 
>    1. the need to access filesystem paths, which can fail, e.g. under a
>       chroot
> 
>    2. the need to open a file descriptor, which can fail under file
>       descriptor exhaustion attacks
> 
>    3. the possibility of getting not-so-random data from /dev/urandom,
>       due to an incompletely initialized kernel entropy pool
> 
> To solve the third point, getrandom(2) was made to block until a
> proper amount of entropy has been accumulated to initialize the CRNG
> ChaCha20 cipher.  This made the system call have no guaranteed
> upper-bound for its initial waiting time.
> 
> Thus when it was introduced at c6e9d6f38894 ("random: introduce
> getrandom(2) system call"), it came with a clear warning: "Any
> userspace program which uses this new functionality must take care to
> assure that if it is used during the boot process, that it will not
> cause the init scripts or other portions of the system startup to hang
> indefinitely."
> 
> Unfortunately, due to multiple factors, including not having this
> warning written in a scary-enough language in the manpages, and due to
> glibc since v2.25 implementing a BSD-like getentropy(3) in terms of
> getrandom(2), modern user-space is calling getrandom(2) in the boot
> path everywhere (e.g. Qt, GDM, etc.)
> 
> Embedded Linux systems were first hit by this, and reports of embedded
> systems "getting stuck at boot" began to be common.  Over time, the
> issue began to even creep into consumer-level x86 laptops: mainstream
> distributions, like Debian Buster, began to recommend installing
> haveged as a duct-tape workaround... just to let the system boot.
> 
> Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33
> ("ext4: make __ext4_get_inode_loc plug"), which merged directory
> lookup code inode table IO, and very fast systemd boots, further
> exaggerated the problem by limiting interrupt-based entropy sources.
> This led to large delays until the kernel's cryptographic random
> number generator (CRNG) got initialized.
> 
> On a Thinkpad E480 x86 laptop and an ArchLinux user-space, the ext4
> commit earlier mentioned reliably blocked the system on GDM boot.
> Mitigate the problem, as a first step, in two ways:
> 
>    1. Issue a big WARN_ON when any process gets stuck on getrandom(2)
>       for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds.
> 
>    2. Introduce new getrandom(2) flags, with clear semantics that can
>       hopefully guide user-space in doing the right thing.
> 
> Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second
> default value. System integrators and distribution builders are deeply
> encouraged not to increase it much: during system boot, you either
> have entropy, or you don't. And if you didn't have entropy, it will
> stay like this forever, because if you had, you wouldn't have blocked
> in the first place. It's an atomic "either/or" situation, with no
> middle ground. Please think twice.

So what do we expect glibc's getentropy() to do?  If it just adds the 
new flag to shut up the warning, we haven't really accomplished much.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-18 20:26                                                                                     ` Linus Torvalds
  2019-09-18 22:12                                                                                       ` Willy Tarreau
@ 2019-09-27 13:57                                                                                       ` Lennart Poettering
  2019-09-27 15:58                                                                                         ` Linus Torvalds
  1 sibling, 1 reply; 211+ messages in thread
From: Lennart Poettering @ 2019-09-27 13:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander E. Patrakov, Eric W. Biederman, Ahmed S. Darwish,
	Theodore Y. Ts'o, Willy Tarreau, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, linux-ext4, lkml

On Mi, 18.09.19 13:26, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Wed, Sep 18, 2019 at 1:15 PM Alexander E. Patrakov
> <patrakov@gmail.com> wrote:
> >
> > No, this is not the solution, if we take seriously not only getrandom
> > hangs, but also urandom warnings. In some setups (root on LUKS is one of
> > them) they happen early in the initramfs. Therefore "restoring" entropy
> > from the previous boot by a script that runs from the main system is too
> > late. That's why it is suggested to load at least a part of the random
> > seed in the boot loader, and that has not been commonly implemented.
>
> Honestly, I think the bootloader suggestion is naive and silly too.
>
> Yes, we now support it. And no, I don't think people will trust that
> either. And I suspect for good reason: there's really very little
> reason to believe that bootloaders would be any better than any other
> part of the system.
>
> So right now some people trust bootloaders exactly _because_ there
> basically is just one or two that do this, and the people who use them
> are usually the people who wrote them or are at least closely
> associated with them. That will change, and then people will say "why
> would I trust that, when we know of bug Xyz".

Doing the random seed in the boot loader is nice for two reasons:

1. It runs very very early, so that the OS can come up with fully
   initialized entropy right from the beginning.

2. The boot loader generally has found some disk to read the kernel from,
   i.e. has a place where stuff can be stored and which can be updated
   (most modern boot loaders can write to disk these days, and so can
   EFI). Thus, it can derive a new random seed from a stored seed on disk
   and pass it to the OS *AND* update it right away on disk ensuring that
   it is never reused again. The point where the OS kernel comes to an
   equivalent point where it can write to disk is much much later,
   i.e. after the initrd, after the transition to the actual OS, ony
   after /var has been remounted writable.

So to me this is not about trust, but about "first place we can read
*AND* write a seed on disk".

i.e. the key to grok here: it's not OK to use a stored seed unless you
can at the same time update the it on disk, as only that protects you
from reusing the key if the system's startup is aborted due to power
failure or such.

> Adding an EFI variable (or other platform nonvolatile thing), and
> reading (and writing to it) purely from the kernel ends up being one
> of those things where you can then say "ok, if we trust the platform
> AT ALL, we can trust that". Since you can't reasonably do things like
> add EFI variables to your distro image by mistake.

NVRAM backing EFI vars sucks. Nothing you want to update on every
cycle. It's OK to update during OS installation, but during every
single boot? I'd rather not.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-27 13:57                                                                                       ` Lennart Poettering
@ 2019-09-27 15:58                                                                                         ` Linus Torvalds
  2019-09-29  9:05                                                                                           ` Lennart Poettering
  0 siblings, 1 reply; 211+ messages in thread
From: Linus Torvalds @ 2019-09-27 15:58 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Alexander E. Patrakov, Eric W. Biederman, Ahmed S. Darwish,
	Theodore Y. Ts'o, Willy Tarreau, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, Ext4 Developers List, lkml

On Fri, Sep 27, 2019 at 6:57 AM Lennart Poettering <mzxreary@0pointer.de> wrote:
>
> Doing the random seed in the boot loader is nice for two reasons:
>
> 1. It runs very very early, so that the OS can come up with fully
>    initialized entropy right from the beginning.

Oh, that part I love.

But I don't believe in your second case:

> 2. The boot loader generally has found some disk to read the kernel from,
>    i.e. has a place where stuff can be stored and which can be updated
>    (most modern boot loaders can write to disk these days, and so can
>    EFI). Thus, it can derive a new random seed from a stored seed on disk
>    and pass it to the OS *AND* update it right away on disk ensuring that
>    it is never reused again.

No. This is absolutely no different at all from user space doing it
early with a file.

All the same "golden image" issues exist, and in general the less the
boot loader writes to disk, the better.

Plus it doesn't actually work anyway in the one situation where people
_really_ want it - embedded devices, where the kernel image is quite
possibly in read-only flash that needs major setup for updates.

PLUS.

Your "it can update it right away on disk" is just crazy talk. With
WHAT? It has no randomness to play with, and it doesn't have time to
do jitter entropy stuff.

So all it can do is a really bad job at taking the previous random
seed, doing some transformation on it, and add a little bit of
whatever system randomness it can find. None of which is any better
than what the kernel can do.

End result: you'd need to have the kernel update whatever bootloader
data later on, and I'm not seeing that happening. Afaik the current
bootloader interface has no way to specify how to update it when you
actually have better randomness.

> NVRAM backing EFI vars sucks. Nothing you want to update on every
> cycle. It's OK to update during OS installation, but during every
> single boot? I'd rather not.

I do agree that EFI nvram isn't wonderful, but hopefully nonvolatile
storage is improving, and it's conceptually the right thing.

                  Linus

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: [PATCH v5 1/1] random: getrandom(2): warn on large CRNG waits, introduce new flags
  2019-09-26 21:39                                         ` Andy Lutomirski
@ 2019-09-28  9:30                                           ` Ahmed S. Darwish
  0 siblings, 0 replies; 211+ messages in thread
From: Ahmed S. Darwish @ 2019-09-28  9:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Theodore Y. Ts'o, Florian Weimer,
	Willy Tarreau, Matthew Garrett, Lennart Poettering,
	Eric W. Biederman, Alexander E. Patrakov, Michael Kerrisk, lkml,
	linux-ext4, linux-api, linux-man

On Thu, Sep 26, 2019 at 02:39:44PM -0700, Andy Lutomirski wrote:
> On 9/26/19 1:44 PM, Ahmed S. Darwish wrote:
> > Since Linux v3.17, getrandom(2) has been created as a new and more
> > secure interface for pseudorandom data requests.  It attempted to
> > solve three problems, as compared to /dev/urandom:
> > 
> >    1. the need to access filesystem paths, which can fail, e.g. under a
> >       chroot
> > 
> >    2. the need to open a file descriptor, which can fail under file
> >       descriptor exhaustion attacks
> > 
> >    3. the possibility of getting not-so-random data from /dev/urandom,
> >       due to an incompletely initialized kernel entropy pool
> > 
> > To solve the third point, getrandom(2) was made to block until a
> > proper amount of entropy has been accumulated to initialize the CRNG
> > ChaCha20 cipher.  This made the system call have no guaranteed
> > upper-bound for its initial waiting time.
> > 
> > Thus when it was introduced at c6e9d6f38894 ("random: introduce
> > getrandom(2) system call"), it came with a clear warning: "Any
> > userspace program which uses this new functionality must take care to
> > assure that if it is used during the boot process, that it will not
> > cause the init scripts or other portions of the system startup to hang
> > indefinitely."
> > 
> > Unfortunately, due to multiple factors, including not having this
> > warning written in a scary-enough language in the manpages, and due to
> > glibc since v2.25 implementing a BSD-like getentropy(3) in terms of
> > getrandom(2), modern user-space is calling getrandom(2) in the boot
> > path everywhere (e.g. Qt, GDM, etc.)
> > 
> > Embedded Linux systems were first hit by this, and reports of embedded
> > systems "getting stuck at boot" began to be common.  Over time, the
> > issue began to even creep into consumer-level x86 laptops: mainstream
> > distributions, like Debian Buster, began to recommend installing
> > haveged as a duct-tape workaround... just to let the system boot.
> > 
> > Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33
> > ("ext4: make __ext4_get_inode_loc plug"), which merged directory
> > lookup code inode table IO, and very fast systemd boots, further
> > exaggerated the problem by limiting interrupt-based entropy sources.
> > This led to large delays until the kernel's cryptographic random
> > number generator (CRNG) got initialized.
> > 
> > On a Thinkpad E480 x86 laptop and an ArchLinux user-space, the ext4
> > commit earlier mentioned reliably blocked the system on GDM boot.
> > Mitigate the problem, as a first step, in two ways:
> > 
> >    1. Issue a big WARN_ON when any process gets stuck on getrandom(2)
> >       for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds.
> > 
> >    2. Introduce new getrandom(2) flags, with clear semantics that can
> >       hopefully guide user-space in doing the right thing.
> > 
> > Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second
> > default value. System integrators and distribution builders are deeply
> > encouraged not to increase it much: during system boot, you either
> > have entropy, or you don't. And if you didn't have entropy, it will
> > stay like this forever, because if you had, you wouldn't have blocked
> > in the first place. It's an atomic "either/or" situation, with no
> > middle ground. Please think twice.
> 
> So what do we expect glibc's getentropy() to do?  If it just adds the new
> flag to shut up the warning, we haven't really accomplished much.

Yes, if glibc adds GRND_SECURE_UNBOUNDED_INITIAL_WAIT to gentropy(3),
then this exercise would indeed be invalidated. Hopefully,
coordination with glibc will be done so it won't happen... @Florian?

Afterwards, a sane approach would be for gentropy(3) to be deprecated,
and to add getentropy_secure_unbounded_initial_wait(3) and
getentropy_insecure(3).

Note that this V5 patch does not claim to fully solve the problem, but
it will:

  1. Pinpoint to the processes causing system boots to block
  
  2. Tell people what correct alternative to use when facing problem
     #1 above, through the proposed getrandom_wait(7) manpage. That
     manpage page will fully describe the problem, and advise
     user-space to either use the new getrandom flags, or the new
     glibc gentropy_*() variants.

thanks,

--
Ahmed Darwish

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-27 15:58                                                                                         ` Linus Torvalds
@ 2019-09-29  9:05                                                                                           ` Lennart Poettering
  0 siblings, 0 replies; 211+ messages in thread
From: Lennart Poettering @ 2019-09-29  9:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander E. Patrakov, Eric W. Biederman, Ahmed S. Darwish,
	Theodore Y. Ts'o, Willy Tarreau, Matthew Garrett,
	Vito Caputo, Andreas Dilger, Jan Kara, Ray Strode,
	William Jon McCann, zhangjs, Ext4 Developers List, lkml

On Fr, 27.09.19 08:58, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Fri, Sep 27, 2019 at 6:57 AM Lennart Poettering <mzxreary@0pointer.de> wrote:
> >
> > Doing the random seed in the boot loader is nice for two reasons:
> >
> > 1. It runs very very early, so that the OS can come up with fully
> >    initialized entropy right from the beginning.
>
> Oh, that part I love.
>
> But I don't believe in your second case:
>
> > 2. The boot loader generally has found some disk to read the kernel from,
> >    i.e. has a place where stuff can be stored and which can be updated
> >    (most modern boot loaders can write to disk these days, and so can
> >    EFI). Thus, it can derive a new random seed from a stored seed on disk
> >    and pass it to the OS *AND* update it right away on disk ensuring that
> >    it is never reused again.
>
> No. This is absolutely no different at all from user space doing it
> early with a file.
>
> All the same "golden image" issues exist, and in general the less the
> boot loader writes to disk, the better.
>
> Plus it doesn't actually work anyway in the one situation where people
> _really_ want it - embedded devices, where the kernel image is quite
> possibly in read-only flash that needs major setup for updates.
>
> PLUS.
>
> Your "it can update it right away on disk" is just crazy talk. With
> WHAT? It has no randomness to play with, and it doesn't have time to
> do jitter entropy stuff.

So these two issues are addressed by the logic implemented in sd-boot
(systemd's boot loader) like this:

The old seed is read off the ESP seed file. We then calculate two hash
sums in counter mode from it (SHA256), one we pass to the OS as seed
to initialize the random pool from. The other we use to update the ESP
seed file with. Unless you are capable of breaking SHA256 this means
the seed passed to the OS and the new seed stored on disk are derived
from the same seed but in a way you cannot determine one if you
managed to learn the other. Moreover, on each boot you are guaranteed
to get two new seeds, each time, and you cannot derive the sums used
on previous boots from those. This means we are robust towards
potential seed reuse when turning the system forcibly off during boot.

Now, what's still missing in the above is protection against "golden
image" issues, as you correctly pointed out. To deal with that the
SHA256 sums are not just hashed from the old seed and the counter, but
also include a system specific "system token" (you may also call it
"salt") which is stored in an EFI variable, persistently, which was
created once, during system installation. This hence gives you the
behaviour your are looking for, using the NVRAM like you suggested,
but we don't need to write the EFI vars all the time, as instead we
update the seed file stored in the ESP each time, and updating the ESP
should be safer and less problematic (i.e. if everything is done right
it's a single sector write).

To make this safer, on EFI firmwares that support the RNG protocol we
also include some data derived from that in the hash, just for good
measure. To sumarize:

NEWDISKSEED = SHA256(OLDDISKSEED || SYSTEMTOKEN || EFIRNGVAL || "1")
SEEDFORLINUX = SHA256(OLDDISKSEED || SYSTEMTOKEN || EFIRNGVAL || "2")

(and no, this is not a crypto scheme I designed, but something
Dr. Bertram Poettering (my brother, a cryptographer) suggested)

> So all it can do is a really bad job at taking the previous random
> seed, doing some transformation on it, and add a little bit of
> whatever system randomness it can find. None of which is any better
> than what the kernel can do.

Well, the kernel cannot hash and rewrite the old seed file early enough,
it's that simple. It can do that only when /var becomes writable,
i.e. very late during boot, much later than when we need entropy
for. The boot loader on the hand, can hash and rewrite the old seed
file even before the kernel initializes, and that's the big benefit!

> End result: you'd need to have the kernel update whatever bootloader
> data later on, and I'm not seeing that happening. Afaik the current
> bootloader interface has no way to specify how to update it when you
> actually have better randomness.

So, you could, but don't have to update the ESP random seed file from
the OS too, every now and then, but the security of the model dos not
rely on that.

(And yes, the above doesn't help if you have a fully R/O medium, but
those tend to be embedded devices, and I am much less concerned about
those, the designers really can deal with the RNG seed issues
themselves, and maybe provide some hw to do it; it's the generic user
PCs that we should be concerned about, and for those the above should
generally work)

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10  4:21 ` Linux 5.3-rc8 Ahmed S. Darwish
  2019-09-10 11:33   ` Linus Torvalds
  2019-09-10 11:56   ` Theodore Y. Ts'o
@ 2019-10-03 21:10   ` Jon Masters
  2019-10-03 21:31   ` Jon Masters
  3 siblings, 0 replies; 211+ messages in thread
From: Jon Masters @ 2019-10-03 21:10 UTC (permalink / raw)
  To: Ahmed S. Darwish, Theodore Ts'o, Andreas Dilger, Linus Torvalds
  Cc: Jan Kara, zhangjs, linux-ext4, linux-kernel

On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:

> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf

While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.

Just a thought.

Jon.

^ permalink raw reply	[flat|nested] 211+ messages in thread

* Re: Linux 5.3-rc8
  2019-09-10  4:21 ` Linux 5.3-rc8 Ahmed S. Darwish
                     ` (2 preceding siblings ...)
  2019-10-03 21:10   ` Jon Masters
@ 2019-10-03 21:31   ` Jon Masters
  3 siblings, 0 replies; 211+ messages in thread
From: Jon Masters @ 2019-10-03 21:31 UTC (permalink / raw)
  To: Ahmed S. Darwish, Theodore Ts'o, Andreas Dilger, Linus Torvalds
  Cc: Jan Kara, zhangjs, linux-ext4, linux-kernel

On 9/10/19 12:21 AM, Ahmed S. Darwish wrote:

> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

Tangent: I asked aloud on Twitter last night if anyone had exploited
Rowhammer-like effects to generate entropy...and sure enough, the usual
suspects have: https://arxiv.org/pdf/1808.04286.pdf

While this requires low level access to a memory controller, it's
perhaps an example of something a platform designer could look at as a
source to introduce boot-time entropy for e.g. EFI_RNG_PROTOCOL even on
an existing platform without dedicated hardware for the purpose.

Just a thought.

Jon.

^ permalink raw reply	[flat|nested] 211+ messages in thread

end of thread, back to index

Thread overview: 211+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHk-=whBQ+6c-h+htiv6pp8ndtv97+45AH9WvdZougDRM6M4VQ@mail.gmail.com>
2019-09-10  4:21 ` Linux 5.3-rc8 Ahmed S. Darwish
2019-09-10 11:33   ` Linus Torvalds
2019-09-10 12:21     ` Linus Torvalds
2019-09-10 17:33     ` Ahmed S. Darwish
2019-09-10 17:47       ` Reindl Harald
2019-09-10 18:21       ` Linus Torvalds
2019-09-11 16:07         ` Theodore Y. Ts'o
2019-09-11 16:45           ` Linus Torvalds
2019-09-11 17:00             ` Linus Torvalds
2019-09-11 17:36               ` Theodore Y. Ts'o
2019-09-12  3:44                 ` Ahmed S. Darwish
2019-09-12  8:25                   ` Theodore Y. Ts'o
2019-09-12 11:34                     ` Linus Torvalds
2019-09-12 11:58                       ` Willy Tarreau
2019-09-14 12:25                       ` [PATCH RFC] random: getrandom(2): don't block on non-initialized entropy pool Ahmed S. Darwish
2019-09-14 14:08                         ` Alexander E. Patrakov
2019-09-15  5:22                           ` [PATCH RFC v2] random: optionally block in getrandom(2) when the CRNG is uninitialized Theodore Y. Ts'o
2019-09-15  8:17                             ` [PATCH RFC v3] random: getrandom(2): optionally block when " Ahmed S. Darwish
2019-09-15  8:59                               ` Lennart Poettering
2019-09-15  9:30                                 ` Willy Tarreau
2019-09-15 10:02                                   ` Ahmed S. Darwish
2019-09-15 10:40                                     ` Willy Tarreau
2019-09-15 10:55                                       ` Ahmed S. Darwish
2019-09-15 11:17                                         ` Willy Tarreau
2019-09-15 17:32                             ` [PATCH RFC v2] random: optionally block in getrandom(2) when the " Linus Torvalds
2019-09-15 18:32                               ` Willy Tarreau
2019-09-15 18:36                                 ` Willy Tarreau
2019-09-15 19:08                                   ` Linus Torvalds
2019-09-15 19:18                                     ` Willy Tarreau
2019-09-15 19:31                                       ` Linus Torvalds
2019-09-15 19:54                                         ` Willy Tarreau
2019-09-15 18:59                                 ` Linus Torvalds
2019-09-15 19:12                                   ` Willy Tarreau
2019-09-16  2:45                                   ` Ahmed S. Darwish
2019-09-16 18:08                               ` Lennart Poettering
2019-09-16 19:16                                 ` Willy Tarreau
2019-09-18 21:15                               ` [PATCH RFC v4 0/1] random: WARN on large getrandom() waits and introduce getrandom2() Ahmed S. Darwish
2019-09-18 21:17                                 ` [PATCH RFC v4 1/1] " Ahmed S. Darwish
2019-09-18 23:57                                   ` Linus Torvalds
2019-09-19 14:34                                     ` Theodore Y. Ts'o
2019-09-19 15:20                                       ` Linus Torvalds
2019-09-19 15:50                                         ` Linus Torvalds
2019-09-20 13:13                                           ` Theodore Y. Ts'o
2019-09-19 20:04                                         ` Linus Torvalds
2019-09-19 20:45                                           ` Alexander E. Patrakov
2019-09-19 21:47                                             ` Linus Torvalds
2019-09-19 22:23                                               ` Alexander E. Patrakov
2019-09-19 23:44                                                 ` Alexander E. Patrakov
2019-09-20 13:16                                                 ` Theodore Y. Ts'o
2019-09-23 11:55                                           ` David Laight
2019-09-20 13:08                                         ` Theodore Y. Ts'o
2019-09-20 13:46                                     ` Ahmed S. Darwish
2019-09-20 14:33                                       ` Andy Lutomirski
2019-09-20 16:29                                         ` Linus Torvalds
2019-09-20 17:52                                           ` Andy Lutomirski
2019-09-20 18:09                                             ` Linus Torvalds
2019-09-20 18:16                                               ` Willy Tarreau
2019-09-20 19:12                                               ` Andy Lutomirski
2019-09-20 19:51                                                 ` Linus Torvalds
2019-09-20 20:11                                                   ` Alexander E. Patrakov
2019-09-20 20:17                                                   ` Matthew Garrett
2019-09-20 20:51                                                   ` Andy Lutomirski
2019-09-20 22:44                                                     ` Linus Torvalds
2019-09-20 23:30                                                       ` Andy Lutomirski
2019-09-21  3:05                                                         ` Willy Tarreau
2019-09-21  6:07                                               ` Florian Weimer
2019-09-23 18:33                                                 ` Andy Lutomirski
2019-09-26 21:11                                                   ` Ahmed S. Darwish
2019-09-20 18:12                                             ` Willy Tarreau
2019-09-20 19:22                                               ` Andy Lutomirski
2019-09-20 19:37                                                 ` Willy Tarreau
2019-09-20 19:52                                                   ` Andy Lutomirski
2019-09-20 20:02                                                 ` Linus Torvalds
2019-09-20 18:15                                             ` Alexander E. Patrakov
2019-09-20 18:29                                               ` Andy Lutomirski
2019-09-20 17:26                                       ` Willy Tarreau
2019-09-20 17:56                                         ` Ahmed S. Darwish
2019-09-26 20:42                                     ` [PATCH v5 0/1] random: getrandom(2): warn on large CRNG waits, introduce new flags Ahmed S. Darwish
2019-09-26 20:44                                       ` [PATCH v5 1/1] " Ahmed S. Darwish
2019-09-26 21:39                                         ` Andy Lutomirski
2019-09-28  9:30                                           ` Ahmed S. Darwish
2019-09-14 15:02                       ` Linux 5.3-rc8 Ahmed S. Darwish
2019-09-14 16:30                         ` Linus Torvalds
2019-09-14 16:35                           ` Alexander E. Patrakov
2019-09-14 16:52                             ` Linus Torvalds
2019-09-14 17:09                               ` Alexander E. Patrakov
2019-09-14 19:19                                 ` Linus Torvalds
2019-09-15  6:56                               ` Lennart Poettering
2019-09-15  7:01                                 ` Willy Tarreau
2019-09-15  7:05                                   ` Lennart Poettering
2019-09-15  7:07                                     ` Willy Tarreau
2019-09-15  8:34                                       ` Lennart Poettering
2019-09-15 17:02                                 ` Linus Torvalds
2019-09-16  3:23                                   ` Theodore Y. Ts'o
2019-09-16  3:40                                     ` Linus Torvalds
2019-09-16  3:56                                       ` Linus Torvalds
2019-09-16 17:00                                       ` Theodore Y. Ts'o
2019-09-16 17:07                                         ` Linus Torvalds
2019-09-14 21:11                           ` Ahmed S. Darwish
2019-09-14 22:05                             ` Martin Steigerwald
2019-09-14 22:24                             ` Theodore Y. Ts'o
2019-09-14 22:32                               ` Linus Torvalds
2019-09-15  1:00                                 ` Theodore Y. Ts'o
2019-09-15  1:10                                   ` Linus Torvalds
2019-09-15  2:05                                     ` Theodore Y. Ts'o
2019-09-15  2:11                                       ` Linus Torvalds
2019-09-15  6:33                                       ` Willy Tarreau
2019-09-15  6:53                                       ` Willy Tarreau
2019-09-15  6:51                           ` Lennart Poettering
2019-09-15  7:27                             ` Ahmed S. Darwish
2019-09-15  8:48                               ` Lennart Poettering
2019-09-15 16:29                             ` Linus Torvalds
2019-09-16  1:40                               ` Ahmed S. Darwish
2019-09-16  1:48                                 ` Vito Caputo
2019-09-16  2:49                                   ` Theodore Y. Ts'o
2019-09-16  4:29                                     ` Willy Tarreau
2019-09-16  5:02                                       ` Linus Torvalds
2019-09-16  6:12                                         ` Willy Tarreau
2019-09-16 16:17                                           ` Linus Torvalds
2019-09-16 17:21                                             ` Theodore Y. Ts'o
2019-09-16 17:44                                               ` Linus Torvalds
2019-09-16 17:55                                                 ` Serge Belyshev
2019-09-16 19:08                                                 ` Willy Tarreau
2019-09-16 23:02                                                 ` Matthew Garrett
2019-09-16 23:05                                                   ` Linus Torvalds
2019-09-16 23:11                                                     ` Matthew Garrett
2019-09-16 23:13                                                       ` Alexander E. Patrakov
2019-09-16 23:15                                                         ` Matthew Garrett
2019-09-16 23:18                                                       ` Linus Torvalds
2019-09-16 23:29                                                         ` Ahmed S. Darwish
2019-09-17  1:05                                                           ` Linus Torvalds
2019-09-17  1:23                                                             ` Matthew Garrett
2019-09-17  1:41                                                               ` Linus Torvalds
2019-09-17  1:46                                                                 ` Matthew Garrett
2019-09-17  5:24                                                                   ` Willy Tarreau
2019-09-17  7:33                                                                     ` Martin Steigerwald
2019-09-17  8:35                                                                       ` Willy Tarreau
2019-09-17  8:44                                                                         ` Martin Steigerwald
2019-09-17 12:11                                                                       ` Theodore Y. Ts'o
2019-09-17 12:30                                                                         ` Ahmed S. Darwish
2019-09-17 12:46                                                                           ` Alexander E. Patrakov
2019-09-17 12:47                                                                           ` Willy Tarreau
2019-09-17 16:08                                                                           ` Lennart Poettering
2019-09-17 16:23                                                                             ` Linus Torvalds
2019-09-17 16:34                                                                               ` Reindl Harald
2019-09-17 17:42                                                                               ` Lennart Poettering
2019-09-17 18:01                                                                                 ` Linus Torvalds
2019-09-17 20:28                                                                                   ` Martin Steigerwald
2019-09-17 20:52                                                                                     ` Ahmed S. Darwish
2019-09-17 21:38                                                                                       ` Martin Steigerwald
2019-09-17 21:52                                                                                         ` Matthew Garrett
2019-09-17 22:10                                                                                           ` Martin Steigerwald
2019-09-18 13:53                                                                                             ` Lennart Poettering
2019-09-19  7:28                                                                                               ` Martin Steigerwald
2019-09-17 23:08                                                                                           ` Linus Torvalds
2019-09-18 13:40                                                                                         ` Lennart Poettering
2019-09-17 20:58                                                                                   ` Linus Torvalds
2019-09-18  9:33                                                                                     ` Rasmus Villemoes
2019-09-18 10:16                                                                                       ` Willy Tarreau
2019-09-18 10:25                                                                                         ` Alexander E. Patrakov
2019-09-18 10:42                                                                                           ` Willy Tarreau
2019-09-18 19:31                                                                                       ` Linus Torvalds
2019-09-18 19:56                                                                                 ` Eric W. Biederman
2019-09-18 20:13                                                                                   ` Linus Torvalds
2019-09-18 20:15                                                                                   ` Alexander E. Patrakov
2019-09-18 20:26                                                                                     ` Linus Torvalds
2019-09-18 22:12                                                                                       ` Willy Tarreau
2019-09-27 13:57                                                                                       ` Lennart Poettering
2019-09-27 15:58                                                                                         ` Linus Torvalds
2019-09-29  9:05                                                                                           ` Lennart Poettering
2019-09-17 13:11                                                                         ` Alexander E. Patrakov
2019-09-17 13:37                                                                           ` Alexander E. Patrakov
2019-09-17 15:57                                                                         ` Lennart Poettering
2019-09-17 16:21                                                                           ` Willy Tarreau
2019-09-17 17:13                                                                             ` Lennart Poettering
2019-09-17 17:29                                                                               ` Willy Tarreau
2019-09-17 20:42                                                                                 ` Martin Steigerwald
2019-09-18 13:38                                                                                 ` Lennart Poettering
2019-09-18 13:59                                                                                   ` Alexander E. Patrakov
2019-09-18 14:50                                                                                     ` Alexander E. Patrakov
2019-09-17 20:36                                                                             ` Martin Steigerwald
2019-09-17 16:27                                                                       ` Linus Torvalds
2019-09-17 16:34                                                                         ` Matthew Garrett
2019-09-17 17:16                                                                           ` Willy Tarreau
2019-09-17 17:20                                                                             ` Matthew Garrett
2019-09-17 17:23                                                                               ` Matthew Garrett
2019-09-17 17:57                                                                               ` Willy Tarreau
2019-09-17 16:58                                                                         ` Alexander E. Patrakov
2019-09-17 17:30                                                                           ` Lennart Poettering
2019-09-17 17:32                                                                             ` Willy Tarreau
2019-09-17 17:41                                                                               ` Alexander E. Patrakov
2019-09-17 17:28                                                                         ` Lennart Poettering
2019-09-17  0:03                                                         ` Matthew Garrett
2019-09-17  0:40                                                         ` Matthew Garrett
2019-09-17  7:15                                                     ` a sane approach to random numbers (was: Re: Linux 5.3-rc8) Martin Steigerwald
2019-09-16 18:00                                               ` Linux 5.3-rc8 Alexander E. Patrakov
2019-09-16 19:53                                               ` Ahmed S. Darwish
2019-09-17 15:32                                               ` Lennart Poettering
2019-09-16  3:31                                 ` Linus Torvalds
2019-09-23 20:49                           ` chaos generating driver was " Pavel Machek
2019-09-14  9:25                     ` Ahmed S. Darwish
2019-09-14 16:27                       ` Theodore Y. Ts'o
2019-09-11 21:41             ` Ahmed S. Darwish
2019-09-11 22:37               ` Ahmed S. Darwish
2019-09-16  3:52           ` Herbert Xu
2019-09-16  4:21             ` Linus Torvalds
2019-09-16  4:53               ` Willy Tarreau
2019-09-10 11:56   ` Theodore Y. Ts'o
2019-09-16 10:33     ` Christoph Hellwig
2019-10-03 21:10   ` Jon Masters
2019-10-03 21:31   ` Jon Masters

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git