Re: x86/random: Speculation to the rescue

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Ahmed S. Darwish" <darwish.07@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Nicholas Mc Guire <hofrat@opentech.at>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: x86/random: Speculation to the rescue
Date: Mon, 30 Sep 2019 10:03:46 -0700	[thread overview]
Message-ID: <CAHk-=wjC0vTOgFU=dhX5NQxF84MBaGNpoQ1M6wD=yzBEy4tzTw@mail.gmail.com> (raw)
In-Reply-To: <20190930163215.GH4519@hirez.programming.kicks-ass.net>

On Mon, Sep 30, 2019 at 9:32 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> In my experience LFSRs are good at defeating branch predictors, which
> would make even in-order cores suffer lots of branch misses. And that
> might be enough, maybe.

Agreed, branch mis-prediction is likely fairly hard to take into
account ahead of time even in an in-order CPU.

But when you know the LFSR, and you know the architecture, you could
just re-create the timing, and have a fairly high chance of getting
the same complex pattern.

And in the simple enough (ie bad) case - the embedded world - you
don't need to "know" or do any deep analysis of anything or try to
predict it ahead of time. You just look at what another identical
machine does when given the identical starting point.

So I don't think an LFSR is all that great on its own. It's
complicated to predict, and it gives odd patterns, but on an in-order
core I'm not convinced it gives sufficiently _different_ odd patterns
across booting.

This, btw, is why you shouldn't trust the "I ran the thing a billion
times" on my PC, even if you were to have an old in-order Atom CPU
available to you. If you didn't restart the whole CPU state from an
identical starting point as you re-run them, the differences you see
may simply not be real. They may be an artificial effect of cumulative
changes to internal CPU branch prediction arrays and cache tag layout.

I don't think it's a huge issue if you have a real load, and you have
_any_ source of entropy at all, but I really do not think that an LFSR
is necessarily a good idea. It's just _too_ identical across reboots,
and will have very similar (but yes, complex due to branch prediction)
behavior across different runs.

Of course, in the "completely and utterly identical state and
absolutely no timing differences anywhere" situation, even my "take
timer interrupts and force at least cache misses on SMP" model doesn't
protect you from just re-running the 100% identical sequence.

But when it's a more complex load than an LFSR, I personally at least
feel better about it. An LFSR I can well imagine will give the exact
same (odd) timing patterns across boots even if there were earlier
minor changes. But hopefully a bigger load with just a more complex
footprint will have more of that. More cache misses, more DRAM
accesses, more branch mispredicts, more "pipeline was broken in a
_slightly_ different place due to timer".

It is also, btw, why I don't mix in TSC _differences_ when I mix
things in. I think it's better to actually mix in the TSC value
itself. Even if you re-run the LFSR, and it has the exact same branch
mis-predicts (because it's the same LFSR), if there were any timing
differences from _anything_ else before you ran that LFSR, then the
bits you'll be mixing in are different across boots. But if you mix in
the relative difference, you might be mixing in the identical bits.

The only real difference is only the initial TSC value, of course, so
the added entropy is small. But when we're talking about trying to get
to a total of 256 bits, a couple of bits here and there end up
mattering.

But no. Never any _guarantees_. There is no absolute security. Only best effort.

An OoO CPU will have a _lot_ more internal state, and a lot of things
that perturb that internal state, and that will make small changes in
timing cause more chaos in the end. Much less to worry about.

An in-order CPU will have less internal state, and so less
perturbations and sources of real entropy from small differences. We
can only hope there is _some_.

It's not like our existing "depend on external interrupt timing" is
any hard guarantee either, regardless of how long we wait or how many
external interrupts we'd get.

It's always a "at some point you have to make a judgement call".

And we all have different levels of comfort about where that point
ends up being.

               Linus