All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: kasan-dev@googlegroups.com, elver@google.com, patches@lists.linux.dev
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: [PATCH] kfence: buffer random bools in bitmask
Date: Wed, 26 Oct 2022 22:40:31 +0200	[thread overview]
Message-ID: <20221026204031.1699061-1-Jason@zx2c4.com> (raw)

Recently kfence got a 4x speed up in calls to the RNG, due to using
internally get_random_u8() instead of get_random_u32() for its random
boolean values. We can extend that speed up another 8x, to 32x total, by
buffering a long at a time, and reading bits from it.

I'd looked into introducing a get_random_bool(), along with the
complexities required for that kind of function to work for a general
case. But kfence is the only high-speed user of random booleans in a hot
path, so we're better off open coding this to take advantage of kfence
particularities.

In particular, we take advantage of the fact that kfence_guarded_alloc()
already disables interrupts for its raw spinlocks, so that we can keep
track of a per-cpu buffered boolean bitmask, without needing to add more
interrupt disabling.

This is slightly complicated by PREEMPT_RT, where we actually need to
take a local_lock instead. But the resulting code in both cases compiles
down to something very compact, and is basically zero cost.
Specifically, on !PREEMPT_RT, this amounts to:

    local_irq_save(flags);
    random boolean stuff;
    raw_spin_lock(&other_thing);
    do the existing stuff;
    raw_spin_unlock_irqrestore(&other_thing, flags);

By using a local_lock in the way this patch does, we now also get this
code on PREEMPT_RT:

    spin_lock(this_cpu_ptr(&local_lock));
    random boolean stuff;
    spin_unlock(this_cpu_ptr(&local_lock));
    raw_spin_lock_irqsave(&other_thing, flags);
    do the existing stuff;
    raw_spin_unlock_irqrestore(&other_thing, flags);

This is also optimal for RT systems. So all and all, this is pretty
good. But there are some compile-time conditionals in order to
accomplish this.

Cc: Marco Elver <elver@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 mm/kfence/core.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 6cbd93f2007b..c212ae0cecba 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -356,21 +356,47 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g
 				  unsigned long *stack_entries, size_t num_stack_entries,
 				  u32 alloc_stack_hash)
 {
+	struct random_bools {
+		unsigned long bits;
+		unsigned int len;
+		local_lock_t lock;
+	};
+	static DEFINE_PER_CPU(struct random_bools, pcpu_bools) = {
+		.lock = INIT_LOCAL_LOCK(pcpu_bools.lock)
+	};
+	struct random_bools *bools;
 	struct kfence_metadata *meta = NULL;
 	unsigned long flags;
 	struct slab *slab;
 	void *addr;
-	const bool random_right_allocate = get_random_u32_below(2);
+	bool random_right_allocate;
 	const bool random_fault = CONFIG_KFENCE_STRESS_TEST_FAULTS &&
 				  !get_random_u32_below(CONFIG_KFENCE_STRESS_TEST_FAULTS);
 
+	local_lock_irqsave(&pcpu_bools.lock, flags);
+	bools = raw_cpu_ptr(&pcpu_bools);
+	if (unlikely(!bools->len)) {
+		bools->bits = get_random_long();
+		bools->len = BITS_PER_LONG;
+	}
+	random_right_allocate = bools->bits & 1;
+	bools->bits >>= 1;
+	bools->len--;
+
 	/* Try to obtain a free object. */
-	raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
+	else
+		raw_spin_lock(&kfence_freelist_lock);
 	if (!list_empty(&kfence_freelist)) {
 		meta = list_entry(kfence_freelist.next, struct kfence_metadata, list);
 		list_del_init(&meta->list);
 	}
-	raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
+	else
+		raw_spin_unlock(&kfence_freelist_lock);
+	local_unlock_irqrestore(&pcpu_bools.lock, flags);
 	if (!meta) {
 		atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_CAPACITY]);
 		return NULL;
-- 
2.38.1


             reply	other threads:[~2022-10-26 20:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 20:40 Jason A. Donenfeld [this message]
2022-10-26 20:45 ` [PATCH] kfence: buffer random bools in bitmask Jason A. Donenfeld
2022-10-27  0:04 ` Marco Elver
2022-10-27  0:26   ` Jason A. Donenfeld
2022-10-27  1:40     ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221026204031.1699061-1-Jason@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=bigeasy@linutronix.de \
    --cc=elver@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=patches@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.