* Stop breaking the CSRNG @ 2019-10-02 16:55 Kurt Roeckx 2019-10-03 3:36 ` Theodore Y. Ts'o 2019-10-03 10:13 ` David Laight 0 siblings, 2 replies; 6+ messages in thread From: Kurt Roeckx @ 2019-10-02 16:55 UTC (permalink / raw) To: linux-kernel; +Cc: Theodore Ts'o Hi, As OpenSSL, we want cryptograhic secure random numbers. Before getrandom(), Linux never provided a good API for that, both /dev/random and /dev/urandom have problems. getrandom() fixed that, so we switched to it were available. It was possible to combine /dev/random and /dev/urandom, and get something that worked properly. You could call select() on /dev/random and know that both were initialized when it returned. But then select() started returning before /dev/random was initialized, so that if you switch to /dev/urnadom, it's still uninitialized. A solution for that was that you could instead read 1 byte from /dev/random, and then switch to /dev/urandom. But that also stopped working, /dev/urandom can still be uninitialized when you can read from /dev/random. So there no longer is a way to wait for /dev/urandom to be initialized. As a result of that, we now refuse to use /dev/urandom on recent kernels, and require to use of getrandom(). (To make this work with older userspace, this means we need to import all the different __NR_getrandom defines, and do the system call ourself.) But it seems people are now thinking about breaking getrandom() too, to let it return data when it's not initialized by default. Please don't. If you think such a mode is useful for some applications, let them set a flag, instead of the reverse. Kurt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stop breaking the CSRNG 2019-10-02 16:55 Stop breaking the CSRNG Kurt Roeckx @ 2019-10-03 3:36 ` Theodore Y. Ts'o 2019-10-03 21:14 ` Kurt Roeckx 2019-10-06 12:15 ` Pavel Machek 2019-10-03 10:13 ` David Laight 1 sibling, 2 replies; 6+ messages in thread From: Theodore Y. Ts'o @ 2019-10-03 3:36 UTC (permalink / raw) To: Kurt Roeckx; +Cc: linux-kernel On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote: > > But it seems people are now thinking about breaking getrandom() too, > to let it return data when it's not initialized by default. Please > don't. "It's complicated" The problem is that whether a CRNG can be considered secure is a property of the entire system, including the hardware, and given the large number of hardware configurations which the kernel and OpenSSL can be used, in practice, we can't assure that getrandom(2) is "secure" without making certain assumptions. For example, if we assume that the CPU is an x86 processor new enough to support RDRAND, and that RDRAND is competently implemented (e.g., it won't disappear after a suspend/resume) and doesn't have any backdoors implanted in it, then it's easy to say that getrandom() will always be secure. But if you assume that there is no hardware random number generator, and everything is driven from a single master oscillator, with no exernal input, and the CPU is utterly simple, with speculation or anything else that might be non-determinstic, AND if we assume that the idiots who make an IOT device use the same random seed across millions of devices all cloned off of the same master imagine, there is ***absoutely*** nothing the kernel can do to guarantee, with 100% certainty, that the CRNG will be initialzied. (This is especially true if the idiots who design the IOT device call OpenSSL to generate their long-term private key the moment the device is first plugged in, before any networking device is brought on-line.) The point with all of this is that both the kernel and OpenSSL, and whether or not they can be combined to create a secure overall solution is going to be dependent on the hardware choices, and choices of the distribution and the application programmers in terms of what other software components are used, and when and where those components try to request random numbers, especially super-early in the boot process. Historically, I've tried to work around this problem by being super paranoid about the choices of thresholds before declaring the CRNG to be initialized, while *also* making sure that at least on most common x86 systems, the CRNG could be considered initialized before the root file system was mounted read/write. But over time, assumptions of what is common hardware changes. SSD's replace HDD's; NAPI and other polling techniques are more common to reduce the number of interrupts; the use of a single master oscillator to drive the all of the various clocks on the system, etc. And software changes --- systemd running boot scripts in parallel means that boot times are reduced, which is good, but it also means the time to when the root is mounted read/write is much shortened. So in the absence of a hardware RNG, or a hardware random number generator which is considered trusted (i.e., should RDRAND beconsidered trusted?), there *will* be times when we will simply fail to be able to generate secure random numbers (at least by our hueristics, which can potentially be overly optimistic on some hardware platforms, and overly conservative on others). The question is then, what do we do? Do we hang the boot --- at which point users will complain to Linus? Or do we just hope that things are "good enough", and that even if the user has elected to say that they don't trust RDRAND, that we'll hope it's competently implement and not backdoored? Or do we assume that using a jitter entropy scheme is actually secure, as opposed to security through obscurity (and maybe is completely pointless on a simple and completely open architecture with no speculation such as RISC-V)? There really are no good choices here. The one thing which Linus has made very clear is that hanging at boot is Not Acceptable. Long term, the best we can do is to through the kitchen sink at the problem. So we should try to use UEFI's RNG if available; use the TPM's RNG if available; use RDRAND if available; try to use a seed file if available (and hope it's not cloned to be identical on a million IOT devices); and so on. Hopefully, they won't *all* incompetently implemented and/or implanted with a backdoor from the NSA or MSS or the KGB. The only words of hope that I can give you is that it's likely that there are so many zero day bugs in the kernel, in userspace applications, and crypto libraries (including maybe OpenSSL), that we don't have to make the CRNG impossible to attack in order to make a difference. We just have to make it harder than finding and exploiting zero day security bugs in *other* parts of the system. "When a mountain bear is chasing after you, you don’t have to outrun the bear. You only have to outrun the person running next to you." :-) Bottom line, we can do the best we can with each of our various components, but without control over the hardware that will be in use, or for OpenSSL, what applications are trying to call OpenSSL for, and when they might try to generate long-term public keys during the first boot, perfection is always going to be impossible to achieve. The only thing we can choose is how do we handle failure. And Linus has laid down the law that a performance improving commit should never cause boot-ups to hang due to the lack of randomness. Given that I can't control when some application might try to call OpenSSL to generate a long-term public key, and OpenSSL certainly can't control if it gets called during early boot, if getrandom(2) ever boots, we can't meet Linus's demand. And given that many users are just installing some kind of userspace jitter entropy to square this particular circle, even though I don't trust a jitter entropy scheme, even if it is insecure, we're also using RDRAND, and ultimately I'll trust RDRAND more than I trust a jitter entropy scheme. And that's where we are right now. Linus has introduced a simple in-kernel jitter entropy system so getrandom(2) will never boot. Is it secure? Who can say? I have my doubts on RISC-V, but I don't use a RISC-V, and hopefully this will be a spur to encourage all RISC-V implementations to include the cryptographic extensions which include a RDRAND-like hardware random number generator into ISA. And since all of *my* x86 systems have RDRAND, I'm at least personally comfortable enough with where we've landed. Your mileage may vary. Regards, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stop breaking the CSRNG 2019-10-03 3:36 ` Theodore Y. Ts'o @ 2019-10-03 21:14 ` Kurt Roeckx 2019-10-06 12:15 ` Pavel Machek 1 sibling, 0 replies; 6+ messages in thread From: Kurt Roeckx @ 2019-10-03 21:14 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: linux-kernel On Wed, Oct 02, 2019 at 11:36:55PM -0400, Theodore Y. Ts'o wrote: > On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote: > > > > But it seems people are now thinking about breaking getrandom() too, > > to let it return data when it's not initialized by default. Please > > don't. > > "It's complicated" > > The problem is that whether a CRNG can be considered secure is a > property of the entire system, including the hardware, and given the > large number of hardware configurations which the kernel and OpenSSL > can be used, in practice, we can't assure that getrandom(2) is > "secure" without making certain assumptions. I'm not saying it's easy. But getrandom() is documented as only returning data after it has been initialized, which is an important property of that interface and the main reason to switch to it. And it seems that because someone's laptop hung during boot because it doesn't find enough entrpoy is enough to break the security of the rest. It seems that the only important thing is that applications don't stop working, because it's clearly visible that it's not working. Returning data before it's been initialized doesn't have the effect of being visibly broken, but it's just as broken, which is in my opinion worse. > But if you assume that there is no hardware random number generator, > and everything is driven from a single master oscillator, with no > exernal input, and the CPU is utterly simple, with speculation or > anything else that might be non-determinstic, AND if we assume that > the idiots who make an IOT device use the same random seed across > millions of devices all cloned off of the same master imagine, there > is ***absoutely*** nothing the kernel can do to guarantee, with 100% > certainty, that the CRNG will be initialzied. (This is especially > true if the idiots who design the IOT device call OpenSSL to generate > their long-term private key the moment the device is first plugged in, > before any networking device is brought on-line.) And returning data before it's been initialized will only make that situtation worse. We can only hope that by refusing to return data the idiot will properly fix it. If the hardware can't provide it, the kernel shouldn't just pretend the hardware did provide it. > There really are no good choices here. The one thing which Linus has > made very clear is that hanging at boot is Not Acceptable. And I think it's not a kernel problem but a combination of hardware, configuration and user space problem. The kernel can of course be improved, and I'm sure it will. I wonder if it's useful to extend getrandom() to provide an option where the application can indicate it doesn't care about security and just wants some number, like what /dev/urandom provides but then as a system call. Other options could be that you're happy with to get data after got an estimated 64 bit of entropy. > And given that many users are just installing some kind of userspace > jitter entropy to square this particular circle, even though I don't > trust a jitter entropy scheme, even if it is insecure, we're also > using RDRAND, and ultimately I'll trust RDRAND more than I trust a > jitter entropy scheme. And that's where we are right now. Linus has > introduced a simple in-kernel jitter entropy system I don't trust it much either. And I think we should at least try to estimate how much entropy it actually provides on various systems, knowing that there will probably be systems where it provides much less than what we think it does. I'm willing to help analyze data if people can provide a list of TSCs that are being added. The more samples the better. I think you want to do this on an idle system. Kurt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stop breaking the CSRNG 2019-10-03 3:36 ` Theodore Y. Ts'o 2019-10-03 21:14 ` Kurt Roeckx @ 2019-10-06 12:15 ` Pavel Machek 1 sibling, 0 replies; 6+ messages in thread From: Pavel Machek @ 2019-10-06 12:15 UTC (permalink / raw) To: Theodore Y. Ts'o; +Cc: Kurt Roeckx, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2243 bytes --] On Wed 2019-10-02 23:36:55, Theodore Y. Ts'o wrote: > On Wed, Oct 02, 2019 at 06:55:33PM +0200, Kurt Roeckx wrote: > > > > But it seems people are now thinking about breaking getrandom() too, > > to let it return data when it's not initialized by default. Please > > don't. > > "It's complicated" > > The problem is that whether a CRNG can be considered secure is a > property of the entire system, including the hardware, and given the > large number of hardware configurations which the kernel and OpenSSL > can be used, in practice, we can't assure that getrandom(2) is > "secure" without making certain assumptions. For example, if we > assume that the CPU is an x86 processor new enough to support RDRAND, > and that RDRAND is competently implemented (e.g., it won't disappear > after a suspend/resume) and doesn't have any backdoors implanted in > it, then it's easy to say that getrandom() will always be secure. Actually... if we have buggy AMD CPU with broken RDRAND, we should still be able to get enough entropy during boot so that getrandom() is cryptographically secure. I don't think we get that right at the moment. > Bottom line, we can do the best we can with each of our various > components, but without control over the hardware that will be in use, > or for OpenSSL, what applications are trying to call OpenSSL for, and > when they might try to generate long-term public keys during the first > boot, perfection is always going to be impossible to achieve. The > only thing we can choose is how do we handle failure. > > And Linus has laid down the law that a performance improving commit > should never cause boot-ups to hang due to the lack of randomness. > Given that I can't control when some application might try to call > OpenSSL to generate a long-term public key, and OpenSSL certainly > can't control if it gets called during early boot, if getrandom(2) > ever boots, we can't meet Linus's demand. You can. You can just access disk while the userpsace is blocked on getrandom. ("find /"). Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Stop breaking the CSRNG 2019-10-02 16:55 Stop breaking the CSRNG Kurt Roeckx 2019-10-03 3:36 ` Theodore Y. Ts'o @ 2019-10-03 10:13 ` David Laight 2019-10-03 11:51 ` Adam Borowski 1 sibling, 1 reply; 6+ messages in thread From: David Laight @ 2019-10-03 10:13 UTC (permalink / raw) To: 'Kurt Roeckx', linux-kernel; +Cc: Theodore Ts'o From: Kurt Roeckx > Sent: 02 October 2019 17:56 > As OpenSSL, we want cryptograhic secure random numbers. Before > getrandom(), Linux never provided a good API for that, both > /dev/random and /dev/urandom have problems. getrandom() fixed > that, so we switched to it were available. The fundamental problem is that you can't always get ' cryptograhic secure random numbers'. No API changes are ever going to change that. The system can either return an error or sleep (possibly indefinitely) until some 'reasonably random' numbers are available. A RISC-V system running on an FGPA (I've only used Altera NIOS cpu) may have absolutely no sources of randomness at boot time. Saying the architecture must include a random number instruction doesn't help! Generating random bits inside the FPGA is somewhere between 'difficult' and impossible (forcing metastability between clock domains might work). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stop breaking the CSRNG 2019-10-03 10:13 ` David Laight @ 2019-10-03 11:51 ` Adam Borowski 0 siblings, 0 replies; 6+ messages in thread From: Adam Borowski @ 2019-10-03 11:51 UTC (permalink / raw) To: David Laight; +Cc: 'Kurt Roeckx', linux-kernel, Theodore Ts'o On Thu, Oct 03, 2019 at 10:13:39AM +0000, David Laight wrote: > From: Kurt Roeckx > > Sent: 02 October 2019 17:56 > > As OpenSSL, we want cryptograhic secure random numbers. Before > > getrandom(), Linux never provided a good API for that, both > > /dev/random and /dev/urandom have problems. getrandom() fixed > > that, so we switched to it were available. > > The fundamental problem is that you can't always get ' cryptograhic secure > random numbers'. No API changes are ever going to change that. > > The system can either return an error or sleep (possibly indefinitely) > until some 'reasonably random' numbers are available. > > A RISC-V system running on an FGPA (I've only used Altera NIOS cpu) > may have absolutely no sources of randomness at boot time. I'd say this is a hardware security vulnerability; no different from eg. having no or faulty MMU, speculation that allows exfiltrating data, etc. We did not understand the seriousness of lacking hardware sources of randomness, but that's a common thing to many other security vulnerabilities. Machines that lack any sources of entropy have their uses, but they're akin to processors with no MMU. You should never run a world-accessible ssh daemon on either of them. > Saying the architecture must include a random number instruction > doesn't help! It won't fix existing systems, and is irrelevant to deeply embedded, but communicating this requirement to SoC designers sounds like a good idea to me. IoTrash appliance makers won't care but their security is already so atrocious that lack of entropy is nowhere near the easiest way to get in, while anyone else will at least notice the warning. Any real-silicon hardware can include an entropy source, and if it doesn't, shaming the maker is the way to go. Calling the problem a security vulnerability (which I say it is) sends a stronger message. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, ⣾⠁⢠⠒⠀⣿⡁ 1kg raspberries, 0.4kg sugar; put into a big jar for 1 month. ⢿⡄⠘⠷⠚⠋⠀ Filter out and throw away the fruits (can dump them into a cake, ⠈⠳⣄⠀⠀⠀⠀ etc), let the drink age at least 3-6 months. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-10-06 12:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-02 16:55 Stop breaking the CSRNG Kurt Roeckx 2019-10-03 3:36 ` Theodore Y. Ts'o 2019-10-03 21:14 ` Kurt Roeckx 2019-10-06 12:15 ` Pavel Machek 2019-10-03 10:13 ` David Laight 2019-10-03 11:51 ` Adam Borowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).