All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 10/50] <linux/random.h> Make prandom_u32_max() exact for constant ranges
@ 2019-12-21  3:30 George Spelvin
  0 siblings, 0 replies; only message in thread
From: George Spelvin @ 2019-12-21  3:30 UTC (permalink / raw)
  To: linux-kernel, lkml; +Cc: Stephen Hemminger, Hannes Frederic Sowa

This is a cute hack I'm not sure should go into the kernel.
Comments very much requested!

Getting rid of the last tiny bit of non-uniformity is quite cheap
*if* the range is known at compile time: a compare with an immediate
and a (rarely taken) conditional branch to retry.

So for hack value, implement this for compile-time constant ranges.

For variable ranges, it involves a division, so just live with the
slight non-uniformity.

The style questions are:
- Is a more accurate result *some* of the time actually worth anything?
- Is the benefit enough to justify the ugly inconsistency?

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/random.h | 63 ++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index 109772175c833..82dd0d613e75c 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -9,6 +9,7 @@
 
 #include <linux/list.h>
 #include <linux/once.h>
+#include <linux/math64.h>
 
 #include <uapi/linux/random.h>
 
@@ -147,9 +148,11 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
  * a property that is provided by prandom_u32().)
  *
  * It is possible to extend this to avoid all bias and return perfectly
- * uniform pseudorandom numbers by discarding and regenerating sometimes,
- * but if your needs are that stringent, you might want to use a stronger
- * generator (like get_random_u32()).
+ * uniform pseudorandom numbers by discarding and regenerating sometimes.
+ * That's easy to do for constant ranges, so this code does it in that
+ * case, but settles for the approimation for variable ranges.  If you
+ * need exact uniformity, you might also want to use a stronger generator
+ * (like get_random_u32()).
  *
  * Ref:
  * 	Fast Random Integer Generation in an Interval
@@ -160,21 +163,47 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
  */
 static inline u32 prandom_u32_max(u32 range)
 {
-	/*
-	 * If the range is a compile-time constant power of 2, then use
-	 * a simple shift.  This is mathematically equivalent to the
-	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
-	 *
-	 * We could do an AND with a mask, but
-	 * 1) The shift is the same speed on a decent CPU,
-	 * 2) It's generally smaller code (smaller immediate), and
-	 * 3) Many PRNGs have trouble with their low-order bits;
-	 *    using the msbits is generaly preferred.
-	 */
-	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
-		return prandom_u32() / (u32)(0x100000000 / range);
-	else
+	if (!__builtin_constant_p(range)) {
+		/*
+		 * If range is a variable, prioritize speed over
+		 * perfect uniformity.
+		 */
 		return reciprocal_scale(prandom_u32(), range);
+	} else if (range & (range - 1)) {
+		/*
+		 * If the range is a compile-time constant, then achieving
+		 * perfect uniformity is one compare with immediate and
+		 * one unlikely branch, so go ahead and do it.
+		 *
+		 * Some 32-bit processors require an additional
+		 * instruction to get the low half of a 64-bit product.
+		 * PowerPC and NIOS have separate "multiply high" and
+		 * "multiply low" instructions.  MIPS32 and ARC need to
+		 * move the result from a special-purpose register to
+		 * a GPR.
+		 */
+		u32 const lim = -range % range;
+		u64 prod;
+
+		do
+			prod = mul_u32_u32(prandom_u32(), range);
+		while (unlikely((u32)prod < lim));
+		return prod >> 32;
+	} else {
+		/*
+		 * If the range is a compile-time constant power of 2,
+		 * then use a simple shift.  This is mathematically
+		 * equivalent to the multiplication, but GCC 8.3 doesn't
+		 * optimize that perfectly.
+		 *
+		 * We could do an AND with a mask, but
+		 * 1) The shift is the same speed on a decent CPU,
+		 * 2) It's generally smaller code (smaller immediate), and
+		 * 3) Many PRNGs have trouble with their low-order bits;
+		 *    using the msbits is generaly preferred.
+		 */
+			return prandom_u32() / (u32)(0x100000000 / range);
+	}
 }
 
 /*
-- 
2.26.0


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2020-03-28 16:43 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-21  3:30 [RFC PATCH v1 10/50] <linux/random.h> Make prandom_u32_max() exact for constant ranges George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.