[PATCH] mm/shuffle.c: optimize add_to_free_area_random()

* [PATCH] mm/shuffle.c: optimize add_to_free_area_random()
@ 2020-03-17 13:50 George Spelvin
  2020-03-17 21:44 ` Kees Cook
  0 siblings, 1 reply; 25+ messages in thread
From: George Spelvin @ 2020-03-17 13:50 UTC (permalink / raw)
  To: Dan Williams, linux-mm; +Cc: Kees Cook, Andrew Morton, lkml

First, use long rather than u64 for the bit buffer type, which
is significantly more efficient on 32-bit processors.

Second, avoid the need for a separate rand_bits counter.
rand_bits is never more than 63, so there's always room in rand
for a bit to mark the end of the available bits.  This makes the
shared state atomically updatable, which makes it a lot easier
to reason about race conditions.

Third, use READ_ONCE and WRITE_ONCE.  Without them, the compiler
may spill to the shared static in arbitrarily perverse ways,
and combined with the fact that the code eschews locking, that
is a recipe for hard-to-find bugs.  Now, a race might cause a bit
to be used twice, or get_random_long() to be called redundantly,
but it can't summon nasal daemons.

I've tried a few variants.  Keeping random lsbits with a most-
significant end marker, or using an explicit bool variable rather
than testing r both increase code size slightly.

		x86_64	i386
This code		 94	 95
Explicit bool		103	 99
Lsbits		 99	101
Both		 96	100

Signed-off-by: George Spelvin <lkml@sdf.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
---
 mm/shuffle.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/shuffle.c b/mm/shuffle.c
index b3fe97fd6654..0e4bf6a8da52 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -186,22 +186,25 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
 void add_to_free_area_random(struct page *page, struct free_area *area,
 		int migratetype)
 {
-	static u64 rand;
-	static u8 rand_bits;
+	static long rand;	/* 0..BITS_PER_LONG-1 buffered random bits */
+	long r = READ_ONCE(rand), rshift = r << 1;;
 
 	/*
-	 * The lack of locking is deliberate. If 2 threads race to
+	 * rand holds some random msbits, with the end marked by a 1 bit.
+	 * This allows us to maintain the pre-generated bits and the
+	 * count of bits in a single, atomically updatable, variable.
+	 *
+	 * The lack of locking is deliberate. If two threads race to
 	 * update the rand state it just adds to the entropy.
 	 */
-	if (rand_bits == 0) {
-		rand_bits = 64;
-		rand = get_random_u64();
+	if (unlikely(rshift == 0)) {
+		r = get_random_long();
+		rshift = r << 1 | 1;
 	}
+	WRITE_ONCE(rand, rshift);
 
-	if (rand & 1)
+	if (r < 0)
 		add_to_free_area(page, area, migratetype);
 	else
 		add_to_free_area_tail(page, area, migratetype);
-	rand_bits--;
-	rand >>= 1;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread