[RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges

* [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges
@ 2019-03-16  6:32 George Spelvin
  2020-03-28 17:32 ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: George Spelvin @ 2019-03-16  6:32 UTC (permalink / raw)
  To: linux-kernel, lkml; +Cc: Stephen Hemminger, Hannes Frederic Sowa

Add special-case optimization for compile-time constant power-of-2
ranges (as a simple shift) so it's *always* at least as efficient as
"prandom_u32() % range".

This saves all callers from having to worry about the issue.  You
may use "% range" or "& (range - 1)" if you know the range is a
power of 2, but you don't have to.

Rename the parameter from the un-mnemonic "ep_ro" to "range".

Also expand comments, and add to prandom_32() documentation
that "prandom_u32() % range" should ''not'' be used.

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/random.h | 56 ++++++++++++++++++++++++++++++++++--------
 lib/random32.c         |  9 ++++---
 2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index f189c927fdea5..109772175c833 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -125,20 +125,56 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
 	DO_ONCE(prandom_seed_full_state, (pcpu_state))
 
 /**
- * prandom_u32_max - returns a pseudo-random number in interval [0, ep_ro)
- * @ep_ro: right open interval endpoint
+ * prandom_u32_max - returns a pseudo-random number in interval [0, @range)
+ * @range: Upper bound on the return value; the first value never returned.
  *
- * Returns a pseudo-random number that is in interval [0, ep_ro). Note
- * that the result depends on PRNG being well distributed in [0, ~0U]
- * u32 space. Here we use maximally equidistributed combined Tausworthe
- * generator, that is, prandom_u32(). This is useful when requesting a
- * random index of an array containing ep_ro elements, for example.
+ * Returns a pseudo-random number 0 <= @x < @range.
+ * This is useful when requesting a random index into an array containing
+ * @range elements, for example.  This is similar to, but more efficient
+ * than, "prandom_u32() % @range".
  *
- * Returns: pseudo-random number in interval [0, ep_ro)
+ * It's also a tiny bit more uniform than the modulo algorithm.  If @range
+ * does not divide 1<<32, then some results will occur ceil((1<<32)/@range)
+ * times, while others will occur floor((1<<32)/range) times.  The modulo
+ * algorithm makes the lowest (1<<32) % @range possible results occur
+ * more often, and the higher values all occur slightly less often.
+ *
+ * A multiplicative algorithm spreads out the over- and underrepresented
+ * outputs evenly across the range.  For example, if @range == 7, then
+ * 2, 4 and 6 occur slighty less often than 0, 1, 3 or 5.
+ *
+ * (This all assumes the PRNG is well distributed in [0, ~0U] u32 space,
+ * a property that is provided by prandom_u32().)
+ *
+ * It is possible to extend this to avoid all bias and return perfectly
+ * uniform pseudorandom numbers by discarding and regenerating sometimes,
+ * but if your needs are that stringent, you might want to use a stronger
+ * generator (like get_random_u32()).
+ *
+ * Ref:
+ * 	Fast Random Integer Generation in an Interval
+ * 	Daniel Lemire
+ * 	https://arxiv.org/abs/1805.10941
+ *
+ * Return: pseudo-random number in interval [0, @range)
  */
-static inline u32 prandom_u32_max(u32 ep_ro)
+static inline u32 prandom_u32_max(u32 range)
 {
-	return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
+	/*
+	 * If the range is a compile-time constant power of 2, then use
+	 * a simple shift.  This is mathematically equivalent to the
+	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
+	 *
+	 * We could do an AND with a mask, but
+	 * 1) The shift is the same speed on a decent CPU,
+	 * 2) It's generally smaller code (smaller immediate), and
+	 * 3) Many PRNGs have trouble with their low-order bits;
+	 *    using the msbits is generaly preferred.
+	 */
+	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
+		return prandom_u32() / (u32)(0x100000000 / range);
+	else
+		return reciprocal_scale(prandom_u32(), range);
 }
 
 /*
diff --git a/lib/random32.c b/lib/random32.c
index 763b920a6206c..fc369f4e550c2 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -75,15 +75,18 @@ EXPORT_SYMBOL(prandom_u32_state);
  *	A 32 bit pseudo-random number is generated using a fast
  *	algorithm suitable for simulation. This algorithm is NOT
  *	considered safe for cryptographic use.
+ *
+ *	To generate a random number in a specified interval, use
+ *	prandom_u32_max(), which uses reciprocal_scale().  Please do
+ *	NOT use a modulo operation (prandom_u32() % range), which is
+ *	more expensive.
  */
 u32 prandom_u32(void)
 {
 	struct rnd_state *state = &get_cpu_var(net_rand_state);
-	u32 res;
+	u32 res = prandom_u32_state(state);
 
-	res = prandom_u32_state(state);
 	put_cpu_var(net_rand_state);
-
 	return res;
 }
 EXPORT_SYMBOL(prandom_u32);
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread