* Re: [RFC PATCH] random: add get_random_max() function [not found] <201903241244.x2OCiL8P011277@sdf.org> @ 2019-03-24 20:47 ` Jason A. Donenfeld 2019-03-25 0:28 ` George Spelvin 2019-03-25 1:14 ` George Spelvin [not found] ` <201903270655.x2R6tDjo020894@sdf.org> 1 sibling, 2 replies; 4+ messages in thread From: Jason A. Donenfeld @ 2019-03-24 20:47 UTC (permalink / raw) To: George Spelvin; +Cc: LKML, Theodore Ts'o I generally use a slightly simpler algorithm in various different projects: //[0, bound) static unsigned long random_bounded(unsigned long bound) { unsigned long ret; const unsigned long max_mod_bound = (1 + ~bound) % bound; if (bound < 2) return 0; do ret = random_integer(); while (ret < max_mod_bound); return ret % bound; } //[min, max_plus_one) static unsigned long random_range(unsigned long min, unsigned long max_plus_one) { return random_bounded(max_plus_one - min) + min; } Is the motivation behind using Lemire that you avoid the division (via the modulo) in favor of a multiplication? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] random: add get_random_max() function 2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld @ 2019-03-25 0:28 ` George Spelvin 2019-03-25 1:14 ` George Spelvin 1 sibling, 0 replies; 4+ messages in thread From: George Spelvin @ 2019-03-25 0:28 UTC (permalink / raw) To: Jason, lkml; +Cc: linux-kernel, tytso On Sun, 24 Mar 2019 at 21:47:50 +0100, Jason A. Donenfeld wrote: > I generally use a slightly simpler algorithm in various different projects: > > //[0, bound) > static unsigned long random_bounded(unsigned long bound) > { > unsigned long ret; > const unsigned long max_mod_bound = (1 + ~bound) % bound; > > if (bound < 2) > return 0; > do > ret = random_integer(); > while (ret < max_mod_bound); > return ret % bound; > } > > Is the motivation behind using Lemire that you avoid the division (via > the modulo) in favor of a multiplication? Yes. If we define eps = max_mod_bound * ldexp(1.0, -BITS_PER_LONG) as the probability of one retry, and retries = eps / (1 - eps) as the expected number of retries, then both algorithms take 1+retries random_integer()s. The above agorithm takes 2 divisions, always. Divides are slow, and usually not pipelined, so two in short succession gets a latency penalty. Lemire's mutiplicative algorithm takes 1 multiplication on the fast path (probability 1 - 2*eps on average), 1 additional division on the slow path (probability 2*eps), and 1 multiplication per retry. In the common case when bound is much less than ULONG_MAX, eps is tiny and the fast path is taken almost all the time, and it's a huge win. Even in the absolute worst case of bound = ULONG_MAX/2 + 2 when eps ~ 0.5 (2 multiplies, 0.5 divide; there's no 2*eps penalty in this case), it's faster as long as 2 mutiplies cost less than 1.5 divides. I you want simpler code, we could omit the fast path and stil get a speedup. But a predictable branch for a divide seemed like a worthwhile trade. (FYI, this all came about as a side project of a kernel-janitor project to replace "prandom_u32() % range" by "prandom_u32() * range >> 32". I'm also annoyed that get_random_u32() and get_random_u64() have separate buffers, even if EFFICIENT_UNALIGNED_ACCESS, but that's a separate complaint.) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] random: add get_random_max() function 2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld 2019-03-25 0:28 ` George Spelvin @ 2019-03-25 1:14 ` George Spelvin 1 sibling, 0 replies; 4+ messages in thread From: George Spelvin @ 2019-03-25 1:14 UTC (permalink / raw) To: Jason, lkml; +Cc: linux-kernel, tytso P.S. The cited paper calls your algorithm the "OpenBSD algorithm" and has a bunch of benchmarks comparing it to others in Fisher-Yates shuffles of sizes 1e3..1e9. Including all overhead (base PRNG, shuffle), it's 3x slower for 32-bit operations and 8x slower for 64-bit up to arrays of size 1e6, after which cache misses slow all algorithms, reducing the ratio. If you want a faster division-based agorithm, the "Java algorithm" does 1+retries divides: unsigned long java(unsigned long s) { unsigned long x, r; do { x = random_integer(); r = x % s; } while (x - r > -s); return r; } ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <201903270655.x2R6tDjo020894@sdf.org>]
* Re: [RFC PATCH v2] random: add get_random_max() function [not found] ` <201903270655.x2R6tDjo020894@sdf.org> @ 2019-03-28 10:00 ` George Spelvin 0 siblings, 0 replies; 4+ messages in thread From: George Spelvin @ 2019-03-28 10:00 UTC (permalink / raw) To: linux-kernel, lkml, tytso; +Cc: Jason By the way, I just noticed that my fallback get_random_max64() algorithm (if there's no __int128 type) is completely broken and will need rewriting. It would work if I rejected and regenerated the high half if the low half were out of range, but that's not what it does. The worst case is a range of 0x10000001, where it would return 0x10000000 half the time. Needs rethinking to find something as simple as possible. I'm sure I can come up with something, but I'm not averse to suggestions if anyone has any. (If I had a reliably fast clz/fls, that would open some possibilities, but sigh...) ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-03-28 10:00 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <201903241244.x2OCiL8P011277@sdf.org> 2019-03-24 20:47 ` [RFC PATCH] random: add get_random_max() function Jason A. Donenfeld 2019-03-25 0:28 ` George Spelvin 2019-03-25 1:14 ` George Spelvin [not found] ` <201903270655.x2R6tDjo020894@sdf.org> 2019-03-28 10:00 ` [RFC PATCH v2] " George Spelvin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.