linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges
@ 2019-03-16  6:32 George Spelvin
  2020-03-28 17:32 ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: George Spelvin @ 2019-03-16  6:32 UTC (permalink / raw)
  To: linux-kernel, lkml; +Cc: Stephen Hemminger, Hannes Frederic Sowa

Add special-case optimization for compile-time constant power-of-2
ranges (as a simple shift) so it's *always* at least as efficient as
"prandom_u32() % range".

This saves all callers from having to worry about the issue.  You
may use "% range" or "& (range - 1)" if you know the range is a
power of 2, but you don't have to.

Rename the parameter from the un-mnemonic "ep_ro" to "range".

Also expand comments, and add to prandom_32() documentation
that "prandom_u32() % range" should ''not'' be used.

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/random.h | 56 ++++++++++++++++++++++++++++++++++--------
 lib/random32.c         |  9 ++++---
 2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/include/linux/random.h b/include/linux/random.h
index f189c927fdea5..109772175c833 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -125,20 +125,56 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
 	DO_ONCE(prandom_seed_full_state, (pcpu_state))
 
 /**
- * prandom_u32_max - returns a pseudo-random number in interval [0, ep_ro)
- * @ep_ro: right open interval endpoint
+ * prandom_u32_max - returns a pseudo-random number in interval [0, @range)
+ * @range: Upper bound on the return value; the first value never returned.
  *
- * Returns a pseudo-random number that is in interval [0, ep_ro). Note
- * that the result depends on PRNG being well distributed in [0, ~0U]
- * u32 space. Here we use maximally equidistributed combined Tausworthe
- * generator, that is, prandom_u32(). This is useful when requesting a
- * random index of an array containing ep_ro elements, for example.
+ * Returns a pseudo-random number 0 <= @x < @range.
+ * This is useful when requesting a random index into an array containing
+ * @range elements, for example.  This is similar to, but more efficient
+ * than, "prandom_u32() % @range".
  *
- * Returns: pseudo-random number in interval [0, ep_ro)
+ * It's also a tiny bit more uniform than the modulo algorithm.  If @range
+ * does not divide 1<<32, then some results will occur ceil((1<<32)/@range)
+ * times, while others will occur floor((1<<32)/range) times.  The modulo
+ * algorithm makes the lowest (1<<32) % @range possible results occur
+ * more often, and the higher values all occur slightly less often.
+ *
+ * A multiplicative algorithm spreads out the over- and underrepresented
+ * outputs evenly across the range.  For example, if @range == 7, then
+ * 2, 4 and 6 occur slighty less often than 0, 1, 3 or 5.
+ *
+ * (This all assumes the PRNG is well distributed in [0, ~0U] u32 space,
+ * a property that is provided by prandom_u32().)
+ *
+ * It is possible to extend this to avoid all bias and return perfectly
+ * uniform pseudorandom numbers by discarding and regenerating sometimes,
+ * but if your needs are that stringent, you might want to use a stronger
+ * generator (like get_random_u32()).
+ *
+ * Ref:
+ * 	Fast Random Integer Generation in an Interval
+ * 	Daniel Lemire
+ * 	https://arxiv.org/abs/1805.10941
+ *
+ * Return: pseudo-random number in interval [0, @range)
  */
-static inline u32 prandom_u32_max(u32 ep_ro)
+static inline u32 prandom_u32_max(u32 range)
 {
-	return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
+	/*
+	 * If the range is a compile-time constant power of 2, then use
+	 * a simple shift.  This is mathematically equivalent to the
+	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
+	 *
+	 * We could do an AND with a mask, but
+	 * 1) The shift is the same speed on a decent CPU,
+	 * 2) It's generally smaller code (smaller immediate), and
+	 * 3) Many PRNGs have trouble with their low-order bits;
+	 *    using the msbits is generaly preferred.
+	 */
+	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
+		return prandom_u32() / (u32)(0x100000000 / range);
+	else
+		return reciprocal_scale(prandom_u32(), range);
 }
 
 /*
diff --git a/lib/random32.c b/lib/random32.c
index 763b920a6206c..fc369f4e550c2 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -75,15 +75,18 @@ EXPORT_SYMBOL(prandom_u32_state);
  *	A 32 bit pseudo-random number is generated using a fast
  *	algorithm suitable for simulation. This algorithm is NOT
  *	considered safe for cryptographic use.
+ *
+ *	To generate a random number in a specified interval, use
+ *	prandom_u32_max(), which uses reciprocal_scale().  Please do
+ *	NOT use a modulo operation (prandom_u32() % range), which is
+ *	more expensive.
  */
 u32 prandom_u32(void)
 {
 	struct rnd_state *state = &get_cpu_var(net_rand_state);
-	u32 res;
+	u32 res = prandom_u32_state(state);
 
-	res = prandom_u32_state(state);
 	put_cpu_var(net_rand_state);
-
 	return res;
 }
 EXPORT_SYMBOL(prandom_u32);
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges
  2019-03-16  6:32 [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges George Spelvin
@ 2020-03-28 17:32 ` Stephen Hemminger
  2020-03-28 18:07   ` George Spelvin
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2020-03-28 17:32 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, Hannes Frederic Sowa

On Sat, 16 Mar 2019 02:32:04 -0400
George Spelvin <lkml@sdf.org> wrote:

> +static inline u32 prandom_u32_max(u32 range)
>  {
> -	return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
> +	/*
> +	 * If the range is a compile-time constant power of 2, then use
> +	 * a simple shift.  This is mathematically equivalent to the
> +	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
> +	 *
> +	 * We could do an AND with a mask, but
> +	 * 1) The shift is the same speed on a decent CPU,
> +	 * 2) It's generally smaller code (smaller immediate), and
> +	 * 3) Many PRNGs have trouble with their low-order bits;
> +	 *    using the msbits is generaly preferred.
> +	 */
> +	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
> +		return prandom_u32() / (u32)(0x100000000 / range);
> +	else
> +		return reciprocal_scale(prandom_u32(), range);


The optimization is good, but I don't thin that the compiler
is able to propogate the constant property into the function.
Did you actually check the generated code.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges
  2020-03-28 17:32 ` Stephen Hemminger
@ 2020-03-28 18:07   ` George Spelvin
  0 siblings, 0 replies; 3+ messages in thread
From: George Spelvin @ 2020-03-28 18:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel, Hannes Frederic Sowa, lkml

On Sat, Mar 28, 2020 at 10:32:29AM -0700, Stephen Hemminger wrote:
> On Sat, 16 Mar 2019 02:32:04 -0400
> George Spelvin <lkml@sdf.org> wrote:
> 
>> +static inline u32 prandom_u32_max(u32 range)
>>  {
>> -	return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
>> +	/*
>> +	 * If the range is a compile-time constant power of 2, then use
>> +	 * a simple shift.  This is mathematically equivalent to the
>> +	 * multiplication, but GCC 8.3 doesn't optimize that perfectly.
>> +	 *
>> +	 * We could do an AND with a mask, but
>> +	 * 1) The shift is the same speed on a decent CPU,
>> +	 * 2) It's generally smaller code (smaller immediate), and
>> +	 * 3) Many PRNGs have trouble with their low-order bits;
>> +	 *    using the msbits is generaly preferred.
>> +	 */
>> +	if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
>> +		return prandom_u32() / (u32)(0x100000000 / range);
>> +	else
>> +		return reciprocal_scale(prandom_u32(), range);

> The optimization is good, but I don't think that the compiler
> is able to propogate the constant property into the function.
> Did you actually check the generated code?

Yes, I checked repeatedly during development.  I just rechecked the
exact code (it's been a while), and verified that

unsigned foo(void)
{
	return prandom_u32_max(256);
}

compiles to
foo:
.LFB1:
	.cfi_startproc
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	call	prandom_u32@PLT
	shrl	$24, %eax
	addq	$8, %rsp
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE1:
	.size	foo, .-foo

But you prompted me to check a few other architectures, and
it's true for them too.  E.g. m68k:

foo:
        jsr prandom_u32
        moveq #24,%d1
        lsr.l %d1,%d0
        rts

(68k is one architecture where the mask is faster than the shift,
so I could handle it separately, but it makes the code even uglier.
Basically, use masks for small ranges, and shifts for large ranges,
and an arch-dependent threshold that depends on the available
immediate constant range.)

ARM, PowerPC, and MIPS all have some hideously large function preamble
code, but the core is a right shift.  E.g.

foo:
.LFB1:
	.cfi_startproc
	stwu 1,-16(1)
	.cfi_def_cfa_offset 16
	mflr 0
	.cfi_register 65, 0
	bcl 20,31,.L2
.L2:
	stw 30,8(1)
	.cfi_offset 30, -8
	mflr 30
	addis 30,30,.LCTOC1-.L2@ha
	stw 0,20(1)
	addi 30,30,.LCTOC1-.L2@l
	.cfi_offset 65, 4
	bl prandom_u32+32768@plt
	lwz 0,20(1)
	lwz 30,8(1)
	addi 1,1,16
	.cfi_restore 30
	.cfi_def_cfa_offset 0
	srwi 3,3,24
	mtlr 0
	.cfi_restore 65
	blr

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-28 18:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-16  6:32 [RFC PATCH v1 09/50] <linux/random.h> prandom_u32_max() for power-of-2 ranges George Spelvin
2020-03-28 17:32 ` Stephen Hemminger
2020-03-28 18:07   ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).