All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH] random: add get_random_max() function
       [not found] <201903241244.x2OCiL8P011277@sdf.org>
@ 2019-03-24 20:47 ` Jason A. Donenfeld
  2019-03-25  0:28   ` George Spelvin
  2019-03-25  1:14   ` George Spelvin
       [not found] ` <201903270655.x2R6tDjo020894@sdf.org>
  1 sibling, 2 replies; 4+ messages in thread
From: Jason A. Donenfeld @ 2019-03-24 20:47 UTC (permalink / raw)
  To: George Spelvin; +Cc: LKML, Theodore Ts'o

I generally use a slightly simpler algorithm in various different projects:

//[0, bound)
static unsigned long random_bounded(unsigned long bound)
{
       unsigned long ret;
       const unsigned long max_mod_bound = (1 + ~bound) % bound;

       if (bound < 2)
               return 0;
       do
               ret = random_integer();
       while (ret < max_mod_bound);
       return ret % bound;
}

//[min, max_plus_one)
static unsigned long random_range(unsigned long min, unsigned long max_plus_one)
{
       return random_bounded(max_plus_one - min) + min;
}

Is the motivation behind using Lemire that you avoid the division (via
the modulo) in favor of a multiplication?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] random: add get_random_max() function
  2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld
@ 2019-03-25  0:28   ` George Spelvin
  2019-03-25  1:14   ` George Spelvin
  1 sibling, 0 replies; 4+ messages in thread
From: George Spelvin @ 2019-03-25  0:28 UTC (permalink / raw)
  To: Jason, lkml; +Cc: linux-kernel, tytso

On Sun, 24 Mar 2019 at 21:47:50 +0100, Jason A. Donenfeld wrote:
> I generally use a slightly simpler algorithm in various different projects:
> 
> //[0, bound)
> static unsigned long random_bounded(unsigned long bound)
> {
>        unsigned long ret;
>        const unsigned long max_mod_bound = (1 + ~bound) % bound;
> 
>        if (bound < 2)
>                return 0;
>        do
>                ret = random_integer();
>        while (ret < max_mod_bound);
>        return ret % bound;
> }
>
> Is the motivation behind using Lemire that you avoid the division (via
> the modulo) in favor of a multiplication?

Yes.  If we define eps = max_mod_bound * ldexp(1.0, -BITS_PER_LONG) as
the probability of one retry, and retries = eps / (1 - eps) as the
expected number of retries, then both algorithms take 1+retries
random_integer()s.

The above agorithm takes 2 divisions, always.  Divides are slow, and
usually not pipelined, so two in short succession gets a latency penalty.

Lemire's mutiplicative algorithm takes 1 multiplication on the fast
path (probability 1 - 2*eps on average), 1 additional division on the slow
path (probability 2*eps), and 1 multiplication per retry.

In the common case when bound is much less than ULONG_MAX, eps is
tiny and the fast path is taken almost all the time, and it's
a huge win.

Even in the absolute worst case of bound = ULONG_MAX/2 + 2 when
eps ~ 0.5 (2 multiplies, 0.5 divide; there's no 2*eps penalty in
this case), it's faster as long as 2 mutiplies cost less than 1.5
divides.

I you want simpler code, we could omit the fast path and stil get
a speedup.  But a predictable branch for a divide seemed like
a worthwhile trade.


(FYI, this all came about as a side project of a kernel-janitor project
to replace "prandom_u32() % range" by "prandom_u32() * range >> 32".
I'm also annoyed that get_random_u32() and get_random_u64() have
separate buffers, even if EFFICIENT_UNALIGNED_ACCESS, but that's
a separate complaint.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] random: add get_random_max() function
  2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld
  2019-03-25  0:28   ` George Spelvin
@ 2019-03-25  1:14   ` George Spelvin
  1 sibling, 0 replies; 4+ messages in thread
From: George Spelvin @ 2019-03-25  1:14 UTC (permalink / raw)
  To: Jason, lkml; +Cc: linux-kernel, tytso

P.S. The cited paper calls your algorithm the "OpenBSD algorithm"
and has a bunch of benchmarks comparing it to others in Fisher-Yates
shuffles of sizes 1e3..1e9.

Including all overhead (base PRNG, shuffle), it's 3x slower for
32-bit operations and 8x slower for 64-bit up to arrays of size
1e6, after which cache misses slow all algorithms, reducing the
ratio.

If you want a faster division-based agorithm, the "Java algorithm"
does 1+retries divides:

unsigned long java(unsigned long s)
{
	unsigned long x, r;

	do {
		x = random_integer();
		r = x % s;
	} while (x - r > -s);
	return r;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] random: add get_random_max() function
       [not found] ` <201903270655.x2R6tDjo020894@sdf.org>
@ 2019-03-28 10:00   ` George Spelvin
  0 siblings, 0 replies; 4+ messages in thread
From: George Spelvin @ 2019-03-28 10:00 UTC (permalink / raw)
  To: linux-kernel, lkml, tytso; +Cc: Jason

By the way, I just noticed that my fallback get_random_max64()
algorithm (if there's no __int128 type) is completely broken and
will need rewriting.

It would work if I rejected and regenerated the high half
if the low half were out of range, but that's not what it does.

The worst case is a range of 0x10000001, where it would return
0x10000000 half the time.

Needs rethinking to find something as simple as possible.
I'm sure I can come up with something, but I'm not averse to
suggestions if anyone has any.

(If I had a reliably fast clz/fls, that would open some
possibilities, but sigh...)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-28 10:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <201903241244.x2OCiL8P011277@sdf.org>
2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld
2019-03-25  0:28   ` George Spelvin
2019-03-25  1:14   ` George Spelvin
     [not found] ` <201903270655.x2R6tDjo020894@sdf.org>
2019-03-28 10:00   ` [RFC PATCH v2] " George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.