[RFC PATCH v1 10/50] <linux/random.h> Make prandom_u32_max() exact for constant ranges

* [RFC PATCH v1 10/50] <linux/random.h> Make prandom_u32_max() exact for constant ranges
@ 2019-12-21  3:30 George Spelvin
  0 siblings, 0 replies; only message in thread
From: George Spelvin @ 2019-12-21  3:30 UTC (permalink / raw)
  To: linux-kernel, lkml; +Cc: Stephen Hemminger, Hannes Frederic Sowa

This is a cute hack I'm not sure should go into the kernel.
Comments very much requested!

Getting rid of the last tiny bit of non-uniformity is quite cheap
*if* the range is known at compile time: a compare with an immediate
and a (rarely taken) conditional branch to retry.

So for hack value, implement this for compile-time constant ranges.

For variable ranges, it involves a division, so just live with the
slight non-uniformity.

The style questions are:
- Is a more accurate result *some* of the time actually worth anything?
- Is the benefit enough to justify the ugly inconsistency?

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/random.h | 63 ++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index 109772175c833..82dd0d613e75c 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -9,6 +9,7 @@
 
 #include <linux/list.h>
 #include <linux/once.h>
+#include <linux/math64.h>
 
 #include <uapi/linux/random.h>
 
@@ -147,9 +148,11 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
  * a property that is provided by prandom_u32().)
  *
  * It is possible to extend this to avoid all bias and return perfectly
- * uniform pseudorandom numbers by discarding and regenerating sometimes,
- * but if your needs are that stringent, you might want to use a stronger
- * generator (like get_random_u32()).
+ * uniform pseudorandom numbers by discarding and regenerating sometimes.
+ * That's easy to do for constant ranges, so this code does it in that
+ * case, but settles for the approimation for variable ranges.  If you
+ * need exact uniformity, you might also want to use a stronger generator
+ * (like get_random_u32()).
  *
  * Ref:
  * 	Fast Random Integer Generation in an Interval
@@ -160,21 +163,47 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
  */
 static inline u32 prandom_u32_max(u32 range)
 {
-	/*
-	 * If the range is a compile-time constant power of 2, then use
-	 * a simple shift.  This is mathematically equivalent to the
-	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
-	 *
-	 * We could do an AND with a mask, but
-	 * 1) The shift is the same speed on a decent CPU,
-	 * 2) It's generally smaller code (smaller immediate), and
-	 * 3) Many PRNGs have trouble with their low-order bits;
-	 *    using the msbits is generaly preferred.
-	 */
-	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
-		return prandom_u32() / (u32)(0x100000000 / range);
-	else
+	if (!__builtin_constant_p(range)) {
+		/*
+		 * If range is a variable, prioritize speed over
+		 * perfect uniformity.
+		 */
 		return reciprocal_scale(prandom_u32(), range);
+	} else if (range & (range - 1)) {
+		/*
+		 * If the range is a compile-time constant, then achieving
+		 * perfect uniformity is one compare with immediate and
+		 * one unlikely branch, so go ahead and do it.
+		 *
+		 * Some 32-bit processors require an additional
+		 * instruction to get the low half of a 64-bit product.
+		 * PowerPC and NIOS have separate "multiply high" and
+		 * "multiply low" instructions.  MIPS32 and ARC need to
+		 * move the result from a special-purpose register to
+		 * a GPR.
+		 */
+		u32 const lim = -range % range;
+		u64 prod;
+
+		do
+			prod = mul_u32_u32(prandom_u32(), range);
+		while (unlikely((u32)prod < lim));
+		return prod >> 32;
+	} else {
+		/*
+		 * If the range is a compile-time constant power of 2,
+		 * then use a simple shift.  This is mathematically
+		 * equivalent to the multiplication, but GCC 8.3 doesn't
+		 * optimize that perfectly.
+		 *
+		 * We could do an AND with a mask, but
+		 * 1) The shift is the same speed on a decent CPU,
+		 * 2) It's generally smaller code (smaller immediate), and
+		 * 3) Many PRNGs have trouble with their low-order bits;
+		 *    using the msbits is generaly preferred.
+		 */
+			return prandom_u32() / (u32)(0x100000000 / range);
+	}
 }
 
 /*
-- 
2.26.0


^ permalink raw reply related	[flat|nested] only message in thread