Let me share my 2 cents:

That permutation might be safe but afaict it hasn't been analyzed wrt modern cryptographic techniques and there might well be differential characteristics, statistical biases, etc.

What about just using SipHash's permutation, already in the kernel? It works on 4*u64 words too, and 6 rounds would be enough.

Doing a basic ops count, we currently have 5 group operations and 3 rotations per round or 150 and 90 for the 30 init rounds. With SipHash it'd be 48 and 36 with the proposed 6 rounds. Probably insignificant speed wise as init is only done once but just to show that we'd get both better security assurance and better performance.

On Thu, Feb 6, 2020 at 4:10 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

Hey Kees,

On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi wrote:
> +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> +static u64 prng_u64(struct prng_state *x)
> +{
> + u64 e;
> +
> + e = x->a - rot(x->b, 7);
> + x->a = x->b ^ rot(x->c, 13);
> + x->b = x->c + rot(x->d, 37);
> + x->c = x->d + e;
> + x->d = e + x->a;
> +
> + return x->d;
> +}

I haven't looked closely at where the original entropy sources are
coming from and how all this works, but on first glance, this prng
doesn't look like an especially cryptographically secure one. I realize
that isn't necessarily your intention (you're focused on speed), but
actually might this be sort of important? If I understand correctly, the
objective of this patch set is so that leaking the address of one
function doesn't leak the address of all other functions, as is the case
with fixed-offset kaslr. But if you leak the addresses of _some_ set of
functions, and your prng is bogus, might it be possible to figure out
the rest? For some prngs, if you give me the output stream of a few
numbers, I can predict the rest. For others, it's not this straight
forward, but there are some varieties of similar attacks. If any of that
set of concerns turns out to apply to your prng_u64 here, would that
undermine kaslr in similar ways as the current fixed-offset variety? Or
does it not matter because it's some kind of blinded fixed-size shuffle
with complex reasoning that makes this not a problem?

Jason