Re: Revising prandom_32 generator

From: George Spelvin <lkml@sdf.org>
To: daniel@iogearbox.net, hannes@stressinduktion.org, lkml@sdf.org
Cc: netdev@vger.kernel.org
Subject: Re: Revising prandom_32 generator
Date: Tue, 26 Mar 2019 19:07:01 GMT	[thread overview]
Message-ID: <201903261907.x2QJ7148017418@sdf.org> (raw)
In-Reply-To: <109c67e3-7cda-40cf-80e1-a2d3500a2b5d@www.fastmail.com>

On Tue, 26 Mar 2019 at 14:03:55 -0400, Hannes Frederic Sowa wrote:
> On Tue, Mar 26, 2019, at 12:10, George Spelvin wrote:
> The conversation definitely makes sense.

I also fixed the seeding of prandom; it now uses the
random_ready_callback system for seeding.

> Are you trying to fix the modulo biases? I think that prandom_u32_max
> also has bias, would that be worth fixing as well?

I could if you like.  I submitted a patch (appended) to do that
for /dev/urandom (using a very clever new algorithm that does it
all with a single multiply in the common case!), but didn't think
the prandom users were critical enough to care.

I documented the bias as follows:
/**
 * prandom_u32_max - returns a pseudo-random number in interval [0,
 * range)
 * @range: Size of random interval [0,range); the first value not
 * included.
 *
 * Returns a pseudo-random number that is in interval [0, range).
 * This is useful when requesting a random index of an array containing
 * range elements, for example.  This is similar to, but more efficient
 * than, "prandom_u32() % range".
 *
 * It's also a tiny bit more uniform than the modulo algorithm.  If
 * range
 * does not divide 1<<32, then some results will occur
 * ceil((1<<32)/range)
 * times, while others will ocur floor((1<32)/range) times.  The "%
 * range"
 * algorithm makes the lowest (1<<32) % range possible results occur
 * more often, and the higher values all occur slightly less often.
 *
 * A multiplicative algorithm spreads out the over- and underrepresented
 * outputs evenly across the range.  For example, if range == 7, then
 * 2, 4 and 6 occur slighty less often than 0, 1, 3 or 5.
 *
 * Note that the result depends on PRNG being well distributed in [0,
 * ~0U]
 * u32 space, a property that is provided by prandom_u32().
 *
 * It is possible to extend this to avoid all bias and return perfectly
 * uniform pseudorandom numbers by discarding and regenerating
 * sometimes,
 * but if your needs are that stringent, you should be using a stronger
 * generator.
 *
 * Ref:
 *      Fast Random Integer Generation in an Interval
 *      Daniel Lemire
 *      https://arxiv.org/abs/1805.10941
 *
 * Returns: pseudo-random number in interval [0, range)
 */

> I think tausworthe is not _trivially_ to predict, what about your
> proposed algorithms? I think it is a nice to have safety-net in
> case too much random numbers accidentally leaks (despite reseeding).

lfsr113 is indeed trivial to predict.  It's a 113-bit LFSR defined
by a degree-113 polynomial.  (The implementation as four separate
polynomials of degree 31, 29, 28 and 25 doesn't change this.)  Given
any 113 bits of its output (not necessarily consecutive), that's
113 boolean linear equations in 113 unknowns to find the internal
state.

I don't have PoC code, but Gaussian elimination is not exactly
rocket science.

All of the proposed replacements are considerably less linear and
harder to invert.  The xoshiro family are the most linear,
with an LFSR core and only a very simple non-linear state masking.


Here's the patch mentioned above for unbiased range reduction.

I just thought of a great improvement!  The slow computation
is "-range % range".  I could use __buitin_constant_p(range)
to decide between the following implementation and one that
has -range % range pre-computed and passed in.  I.e.

static inline u32 get_random_max32(u32 range)
{
	if (__builtin_constant_p(range))
		return get_random_max32_const(range, -range % range);
	else
		return get_random_max32_var(range);
}
u32 get_random_max32_const(u32 range, u32 lim)
{
	u64 prod;
	do
		prod = mul_u32_u32(get_random_u32(), range);
	while (unlikely((u32)prod < lim));
	return prod >> 32;
}
u32 get_random_max32_var(u32 range)
{
	u64 prod = mul_u32_u32(get_random_u32(), range);

	if ((u32)prod < range - 1) {
		u32 lim = -range % range;	/* = (1<<32) % range */

		while ((u32)prod < lim)
			prod = mul_u32_u32(get_random_u32(), range);
	}
	return prod >> 32;
}
 
From: George Spelvin <lkml@sdf.org>
Subject: [RFC PATCH] random: add get_random_max() function
To: linux-kernel@vger.kernel.org,
    Theodore Ts'o <tytso@mit.edu>
Cc: : Jason A. Donenfeld <Jason@zx2c4.com>,
    George Spelvin <lkml@sdf.org>

RFC because:
- Only lightly tested so far, and
- Expecting feedback on the function names

I got annoyed that randomize_page() wasn't exactly uniform, and used the
slow modulo-based range reduction rather than the fast multiplicative one,
so I fixed it.

I figure, if you're going to the trouble of using cryptographic random
numbers, you don't want biased outputs, even if it's slight.

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c  | 106 +++++++++++++++++++++++++++++++++++++----
 include/linux/random.h |  32 +++++++++++++
 2 files changed, 130 insertions(+), 8 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index feef20fa7e87..cf00fc9cd706 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -60,6 +60,38 @@ static inline unsigned long get_random_long(void)
 #endif
 }
 
+u32 get_random_max32(u32 range);
+#if CONFIG_64BIT
+u64 get_random_max64(u64 range);
+#endif
+
+/**
+ * get_random_max - Generate a uniform random number 0 <= x < range
+ * @range:	Modulus for random number; must not be zero!
+ */
+#if BITS_PER_LONG == 64
+/*
+ * We inline the check for 32-bit range on the grounds that it can
+ * often be optimized out because range is a compile-time constant
+ * or passed from a 32-bit variable.
+ *
+ * We could get fancier with __builtin_constant_p, but it's not used
+ * often enough to be worth the hassle.
+ */
+static inline unsigned long get_random_max(unsigned long range)
+{
+	if (range <= 0xffffffff)
+		return get_random_max32(range);
+	else
+		return get_random_max64(range);
+}
+#else
+static inline unsigned long get_random_max(unsigned long range)
+{
+	return get_random_max32(range);
+}
+#endif
+
 /*
  * On 64-bit architectures, protect against non-terminated C string overflows
  * by zeroing out the first byte of the canary; this leaves 56 bits of entropy.
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9f4a348c7eb5..a891886d6228 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2353,6 +2353,92 @@ static void invalidate_batched_entropy(void)
 	write_unlock_irqrestore(&batched_entropy_reset_lock, flags);
 }
 
+/**
+ * get_random_max32 - Generate a uniform random number 0 <= x < range < 2^32
+ * @range:	Modulus for random number; must not be zero!
+ *
+ * This generates an exactly uniform distribution over [0, range).
+ * If we're going to go to the expense of a cryptographic RNG, we
+ * may as well get the distribution right.
+ *
+ * This uses Lemire's multiplicative algorithm, from
+ *	Fast Random Integer Generation in an Interval
+ *	https://arxiv.org/abs/1805.10941
+ *
+ * The standard mutiplicative range technique is (rand32() * range) >> 32.
+ * However, there are either floor((1<<32) / range) or ceil((1<<32) / range)
+ * random values that will result in each output value.  More specifically,
+ * lim = (1 << 32) % range values will be generated one extra time.
+ *
+ * Lemire proves (Lemma 4.1) that those extra values can be identified
+ * by the lsbits of the product.  If you discard and retry whenever the
+ * lsbits are less than lim, you get the floor option always.
+ */
+u32 get_random_max32(u32 range)
+{
+	u64 prod = mul_u32_u32(get_random_u32(), range);
+
+	/*
+	 * Fast-patch check: lim < range, so lsbits < range-1
+	 * implies lsbits < lim.
+	 */
+	if ((u32)prod < range - 1) {
+		/* Slow path; we need to divide to check exactly */
+		u32 lim = -range % range;	/* = (1<<32) % range */
+
+		while ((u32)prod < lim)
+			prod = mul_u32_u32(get_random_u32(), range);
+	}
+	return prod >> 32;
+}
+EXPORT_SYMBOL(get_random_max32);
+
+#ifdef CONFIG_64BIT
+/**
+ * get_random_max64 - Generate a uniform random number 0 <= x < range < 2^64
+ * @range:	Modulus for random number; must not be zero!
+ *
+ * Like get_random_max32, but for larger ranges.  There are two
+ * implementations, depending on CONFIG_ARCH_SUPPORTS_INT128.
+ *
+ * This function does not special-case 32-bit ranges; use the generic
+ * get_random_max() for that.
+ */
+u64 get_random_max64(u64 range)
+{
+#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
+	/* Same as get_random_max32(), with larger data types. */
+	unsigned __int128 prod = (unsigned __int128)get_random_u64() * range;
+
+	if ((u64)prod < range - 1) {
+		/* Slow path; we need to divide to check exactly */
+		u64 lim = -range % range;	/* = (1<<64) % range */
+
+		while ((u64)prod < lim)
+			prod = (unsigned __int128)get_random_u64() * range;
+	}
+	return prod >> 64;
+#else
+	/*
+	 * We don't have efficient 128-bit products (e.g. SPARCv9), so
+	 * break it into two halves: choose the high bits, then
+	 * choose the low bits depending on them.
+	 *
+	 * Note that if range >= 0xffffffff00000001, then range_high
+	 * will wrap to 0.  This is handled correctly.
+	 */
+	u32 range_high = (range + 0xffffffff) >> 32;
+	u32 msbits = likely(range_high) ? get_random_max32(range_high)
+                                        : get_random_u32();
+	u32 lsbits = likely(msbits != (u32)(range >> 32))
+			? get_random_u32()
+			: get_random_max32((u32)range);
+	return (u64)msbits << 32 | lsbits;
+#endif
+}
+EXPORT_SYMBOL(get_random_max64);
+#endif /* CONFIG_64BIT */
+
 /**
  * randomize_page - Generate a random, page aligned address
  * @start:	The smallest acceptable address the caller will take.
@@ -2370,20 +2456,24 @@ static void invalidate_batched_entropy(void)
 unsigned long
 randomize_page(unsigned long start, unsigned long range)
 {
-	if (!PAGE_ALIGNED(start)) {
-		range -= PAGE_ALIGN(start) - start;
-		start = PAGE_ALIGN(start);
-	}
+	/* How much to round up to make start page-aligned */
+	unsigned long align = -start & (PAGE_SIZE - 1);
 
-	if (start > ULONG_MAX - range)
-		range = ULONG_MAX - start;
+	if (unlikely(range < align))
+		return start;
+
+	start += align;
+	range -= align;
+
+	if (unlikely(range > -start))
+		range = -start;
 
 	range >>= PAGE_SHIFT;
 
-	if (range == 0)
+	if (unlikely(range == 0))
 		return start;
 
-	return start + (get_random_long() % range << PAGE_SHIFT);
+	return start + (get_random_max(range) << PAGE_SHIFT);
 }
 
 /* Interface for in-kernel drivers of true hardware RNGs.
-- 
2.20.1