Re: arc4random - are you sure we want these?

From: Florian Weimer <fweimer@redhat.com>
To: "Jason A. Donenfeld via Libc-alpha" <libc-alpha@sourceware.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	Yann Droneaud <ydroneaud@opteya.com>,
	Michael@phoronix.com, linux-crypto@vger.kernel.org,
	jann@thejh.net
Subject: Re: arc4random - are you sure we want these?
Date: Mon, 25 Jul 2022 14:39:24 +0200	[thread overview]
Message-ID: <87v8rlqscj.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <Yt54x7uWnsL3eZSx@zx2c4.com> (Jason A. Donenfeld via Libc-alpha's message of "Mon, 25 Jul 2022 13:04:39 +0200")

* Jason A. Donenfeld via Libc-alpha:

> Hi Florian,
>
> On Mon, Jul 25, 2022 at 12:11:27PM +0200, Florian Weimer wrote:
>> > I really wonder whether this is a good idea, whether this is something
>> > that glibc wants, and whether it's a design worth committing to in the
>> > long term.
>> 
>> Do you object to the interface, or the implementation?
>> 
>> The implementation can be improved easily enough at a later date.
>
> Sort of both, as I don't think it's wise to commit to the former without
> a good idea of the full ideal space of the latter, and very clearly from
> reading that discussion, that hasn't been explored.

But we are only concerned with the application interface.  Do we really
expect that to be different from arc4random_buf and its variants?

The interface between glibc and the kernel can be changed without
impacting applications.

> In particular, Adhemerval has said you won't be committing to making
> arc4random suitable for crypto, going so far as to mention it's not a
> CSPRNG in the documentation.

Below you suggest to use GRND_INSECURE to avoid deadlocks during
booting.  It's documented in the UAPI header as “Return
non-cryptographic random bytes”.  I assume it's broadly equivalent to
reading from /dev/urandom (which we need to support for backwards
compatibility, and currently use to avoid blocking).  This means that we
cannot really document the resulting bits as cryptographically strong
from an application perspective because the kernel is not willing to
make this commitment.

>> > Firstly, for what use cases does this actually help? As of recent
>> > changes to the Linux kernels -- now backported all the way to 4.9! --
>> > getrandom() and /dev/urandom are extremely fast and operate over per-cpu
>> > states locklessly. Sure you avoid a syscall by doing that in userspace,
>> > but does it really matter? Who exactly benefits from this?
>> 
>> getrandom may be fast for bulk generation.  It's not that great for
>> generating a few bits here and there.  For example, shuffling a
>> 1,000-element array takes 18 microseconds with arc4random_uniform in
>> glibc, and 255 microseconds with the naïve getrandom-based
>> implementation (with slightly biased results; measured on an Intel
>> i9-10900T, Fedora's kernel-5.18.11-100.fc35.x86_64).
>
> So maybe we should look into vDSO'ing getrandom(), if this is a problem
> for real use cases, and you find that these sorts of things are
> widespread in real code?

We can investigate that, but it doesn't change the application
interface.

>> > You miss out on this with arc4random, and if that information _is_ to be
>> > exported to userspace somehow in the future, it would be awfully nice to
>> > design the userspace interface alongside the kernel one.
>> 
>> What is the kernel interface you are talking about?  From an interface
>> standpoint, arc4random_buf and getrandom are very similar, with the main
>> difference is that arc4random_buf cannot report failure (except by
>> terminating the process).
>
> Referring to information above about reseeding. So in this case it would
> be some form of a generation counter most likely. There's also been some
> discussion about exporting some aspect of the vmgenid counter to
> userspace.

We don't need any of that in userspace if the staging buffer is managed
by the kernel, which is why the thread-specific data donation is so
attractive as an approach.  The kernel knows where all these buffers are
located and can invalidate them as needed.

>> > Seen from this perspective, going with OpenBSD's older paradigm might be
>> > rather limiting. Why not work together, between the kernel and libc, to
>> > see if we can come up with something better, before settling on an
>> > interface with semantics that are hard to walk back later?
>> 
>> Historically, kernel developers were not interested in solving some of
>> the hard problems (especially early seeding) that prevent the use of
>> getrandom during early userspace stages.
>
> I really don't know what you're talking about here. I understood you up
> until the opening parenthesis, and initially thought to reply, "but I am
> interested! let's work together" or something, but then you mentioned
> getrandom()'s issues with early userspace, and I became confused. If you
> use getrandom(GRND_INSECURE), it won't block and you'll get bytes even
> before the rng has seeded. If you use getrandom(0), the kernel's RNG
> will use jitter to seed itself ASAP so it doesn't block forever (on
> platforms where that's possible, anyhow). Both of these qualities mostly
> predate my heavy involvement. So your statement confuses me. But with
> that said, if you do find some lack of interest on something you think
> is important, please give me a try, and maybe you'll have better luck. I
> very much am interested in solving longstanding problems in this domain.

I tried to de-escalate here, and clearly that didn't work.  The context
here is that historically, working with the “random” kernel maintainers
has been very difficult for many groups of people.  Many of us are tired
of those non-productive discussions.  I forgot that this has recently
changed on the kernel side.  I understand that it's taking years to
overcome these perceptions.  glibc is still struggling with this, too.

Regarding the technical aspect, GRND_INSECURE is somewhat new-ish, but
as I wrote above, it's UAPI documentation is a bit scary.  Maybe it
would be possible to clarify this in the manual pages a bit?  I *assume*
that if we are willing to read from /dev/urandom, we can use
GRND_INSECURE right away to avoid that fallback path on sufficiently new
kernels.  But it would be nice to have confirmation.

>> > As-is, it's hard to recommend that anybody really use these functions.
>> > Just keep using getrandom(2), which has mostly favorable semantics.
>> 
>> Some applications still need to run in configurations where getrandom is
>> not available (either because the kernel is too old, or because it has
>> been disabled via seccomp).
>
> I don't quite understand this. People without getrandom() typically
> fallback to using /dev/urandom. "But what if FD in derp derp mountns
> derp rlimit derp explosion derp?!" Yes, sure, which is why getrandom()
> came about. But doesn't arc4random() fallback to using /dev/urandom in
> this exact same way? I don't see how arc4random() really changes the
> equation here, except that maybe I should amend my statement to say,
> "Just keep using getrandom(2) or /dev/urandom, which has mostly
> favorable semantics." (After all, I didn't see any wild-n-crazy fallback
> to AT_RANDOM like what systemd does with random-util.c:
> https://github.com/systemd/systemd/blob/main/src/basic/random-util.c )

I had some patches with AT_RANDOM fallback, including overwriting
AT_RANDOM with output from the seeded PRNG.  It's certainly messy.  I
probably didn't bother to post these patches given how bizarre the whole
thing was.  I did have fallback to CPU instructions, but that turned out
to be unworkable due to bugs in suspend on AMD CPUs (kernel or firmware,
unclear).

> Seen in that sense, as I wrote to Paul, if you're after arc4random for
> source code compatibility -- or because you simply like its non-failing
> interface and want to commit to that no matter the costs whatsoever --
> then you could start by making that a light shim around getrandom()
> (falling back to /dev/urandom, I guess), and then we can look into ways
> of accelerating getrandom() for new kernels. This way you don't ship
> something broken out of the gate, and there's still room for
> improvement. Though I would still note that committing to the interface
> early like this comes with some concern.

The ChaCha20 generator we currently have in the tree may not be
required, true.  But this doesn't make what we have today “broken”, it's
merely overly complicated.  And replacing that with a straight buffer
from getrandom does not change the external interface, so we can do this
any time we want.

>> The performance numbers suggest that we benefit from buffering in user
>> space.
>
> The question is whether it's safe and advisable to buffer this way in
> userspace. Does userspace have the right information now of when to
> discard the buffer and get a new one? I suspect it does not.

Not completely, no, but we can cover many cases.  I do not currently see
a way around that if we want to promote arc4random_uniform(limit) as a
replacement for random() % limit.

>> But that's an implementation detail, and something we can revisit later.
>
> No, these are not mere implementation details. When Adhemerval is
> talking about warning people in the documentation that this shouldn't be
> used for crypto, that should be a wake up call that something is really
> off here. Don't ship things you know are broken, and then call that an
> "implementation detail" that can be hedged with "documentation".

Again, given the issues around GRND_INSECURE (the reason why it exists),
we do not have much choice on the glibc side.  And these issues will be
there for the foreseeable future, whether glibc provides arc4random or
not.

Thanks,
Florian