All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] stackdepot: allow evicting stack traces
@ 2023-08-29 17:11 andrey.konovalov
  2023-08-29 17:11 ` [PATCH 01/15] stackdepot: check disabled flag when fetching andrey.konovalov
                   ` (15 more replies)
  0 siblings, 16 replies; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Currently, the stack depot grows indefinitely until it reaches its
capacity. Once that happens, the stack depot stops saving new stack
traces.

This creates a problem for using the stack depot for in-field testing
and in production.

For such uses, an ideal stack trace storage should:

1. Allow saving fresh stack traces on systems with a large uptime while
   limiting the amount of memory used to store the traces;
2. Have a low performance impact.

Implementing #1 in the stack depot is impossible with the current
keep-forever approach. This series targets to address that. Issue #2 is
left to be addressed in a future series.

This series changes the stack depot implementation to allow evicting
unneeded stack traces from the stack depot. The users of the stack depot
can do that via a new stack_depot_evict API.

Internal changes to the stack depot code include:

1. Storing stack traces in 32-frame-sized slots (vs precisely-sized slots
   in the current implementation);
2. Keeping available slots in a freelist (vs keeping an offset to the next
   free slot);
3. Using a read/write lock for synchronization (vs a lock-free approach
   combined with a spinlock).

This series also integrates the eviction functionality in the tag-based
KASAN modes. (I will investigate integrating it into the Generic mode as
well in the following iterations of this series.)

Despite wasting some space on rounding up the size of each stack record
to 32 frames, with this change, the tag-based KASAN modes end up
consuming ~5% less memory in stack depot during boot (with the default
stack ring size of 32k entries). The reason for this is the eviction of
irrelevant stack traces from the stack depot, which frees up space for
other stack traces.

For other tools that heavily rely on the stack depot, like Generic KASAN
and KMSAN, this change leads to the stack depot capacity being reached
sooner than before. However, as these tools are mainly used in fuzzing
scenarios where the kernel is frequently rebooted, this outcome should
be acceptable.

There is no measurable boot time performace impact of these changes for
KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
depot without performance optimizations is not suitable for intended use
of those anyway), but I expect a similar result. Obtaining and copying
stack trace frames when saving them into stack depot is what takes the
most time.

This series does not yet provide a way to configure the maximum size of
the stack depot externally (e.g. via a command-line parameter). This will
either be added in the following iterations of this series (if the used
approach gets approval) or will be added together with the performance
improvement changes.

Andrey Konovalov (15):
  stackdepot: check disabled flag when fetching
  stackdepot: simplify __stack_depot_save
  stackdepot: drop valid bit from handles
  stackdepot: add depot_fetch_stack helper
  stackdepot: use fixed-sized slots for stack records
  stackdepot: fix and clean-up atomic annotations
  stackdepot: rework helpers for depot_alloc_stack
  stackdepot: rename next_pool_required to new_pool_required
  stackdepot: store next pool pointer in new_pool
  stackdepot: store free stack records in a freelist
  stackdepot: use read/write lock
  stackdepot: add refcount for records
  stackdepot: add backwards links to hash table buckets
  stackdepot: allow users to evict stack traces
  kasan: use stack_depot_evict for tag-based modes

 include/linux/stackdepot.h |  11 ++
 lib/stackdepot.c           | 361 ++++++++++++++++++++++++-------------
 mm/kasan/tags.c            |   7 +-
 3 files changed, 249 insertions(+), 130 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/15] stackdepot: check disabled flag when fetching
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  7:40   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 02/15] stackdepot: simplify __stack_depot_save andrey.konovalov
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Do not try fetching a stack trace from the stack depot if the
stack_depot_disabled flag is enabled.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2f5aa851834e..3a945c7206f3 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -477,7 +477,7 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 	 */
 	kmsan_unpoison_memory(entries, sizeof(*entries));
 
-	if (!handle)
+	if (!handle || stack_depot_disabled)
 		return 0;
 
 	if (parts.pool_index > pool_index_cached) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/15] stackdepot: simplify __stack_depot_save
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
  2023-08-29 17:11 ` [PATCH 01/15] stackdepot: check disabled flag when fetching andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  7:41   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 03/15] stackdepot: drop valid bit from handles andrey.konovalov
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

The retval local variable in __stack_depot_save has the union type
handle_parts, but the function never uses anything but the union's
handle field.

Define retval simply as depot_stack_handle_t to simplify the code.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 3a945c7206f3..0772125efe8a 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -360,7 +360,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					gfp_t alloc_flags, bool can_alloc)
 {
 	struct stack_record *found = NULL, **bucket;
-	union handle_parts retval = { .handle = 0 };
+	depot_stack_handle_t handle = 0;
 	struct page *page = NULL;
 	void *prealloc = NULL;
 	unsigned long flags;
@@ -377,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	nr_entries = filter_irq_stacks(entries, nr_entries);
 
 	if (unlikely(nr_entries == 0) || stack_depot_disabled)
-		goto fast_exit;
+		return 0;
 
 	hash = hash_stack(entries, nr_entries);
 	bucket = &stack_table[hash & stack_hash_mask];
@@ -443,9 +443,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER);
 	}
 	if (found)
-		retval.handle = found->handle.handle;
-fast_exit:
-	return retval.handle;
+		handle = found->handle.handle;
+	return handle;
 }
 EXPORT_SYMBOL_GPL(__stack_depot_save);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/15] stackdepot: drop valid bit from handles
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
  2023-08-29 17:11 ` [PATCH 01/15] stackdepot: check disabled flag when fetching andrey.konovalov
  2023-08-29 17:11 ` [PATCH 02/15] stackdepot: simplify __stack_depot_save andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  7:43   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 04/15] stackdepot: add depot_fetch_stack helper andrey.konovalov
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Stack depot doesn't use the valid bit in handles in any way, so drop it.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 0772125efe8a..482eac40791e 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -32,13 +32,12 @@
 
 #define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
 
-#define DEPOT_VALID_BITS 1
 #define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
 #define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
 #define DEPOT_STACK_ALIGN 4
 #define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
-#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_VALID_BITS - \
-			       DEPOT_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
+#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
+			       STACK_DEPOT_EXTRA_BITS)
 #define DEPOT_POOLS_CAP 8192
 #define DEPOT_MAX_POOLS \
 	(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
@@ -50,7 +49,6 @@ union handle_parts {
 	struct {
 		u32 pool_index	: DEPOT_POOL_INDEX_BITS;
 		u32 offset	: DEPOT_OFFSET_BITS;
-		u32 valid	: DEPOT_VALID_BITS;
 		u32 extra	: STACK_DEPOT_EXTRA_BITS;
 	};
 };
@@ -303,7 +301,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->size = size;
 	stack->handle.pool_index = pool_index;
 	stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
-	stack->handle.valid = 1;
 	stack->handle.extra = 0;
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	pool_offset += required_size;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/15] stackdepot: add depot_fetch_stack helper
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (2 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 03/15] stackdepot: drop valid bit from handles andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  7:47   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 05/15] stackdepot: use fixed-sized slots for stack records andrey.konovalov
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Add a helper depot_fetch_stack function that fetches the pointer to
a stack record.

With this change, all static depot_* functions now operate on stack pools
and the exported stack_depot_* functions operate on the hash table.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 482eac40791e..2128108f2acb 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -304,6 +304,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->handle.extra = 0;
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	pool_offset += required_size;
+
 	/*
 	 * Let KMSAN know the stored stack record is initialized. This shall
 	 * prevent false positive reports if instrumented code accesses it.
@@ -313,6 +314,32 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	return stack;
 }
 
+static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
+{
+	union handle_parts parts = { .handle = handle };
+	/*
+	 * READ_ONCE pairs with potential concurrent write in
+	 * depot_alloc_stack.
+	 */
+	int pool_index_cached = READ_ONCE(pool_index);
+	void *pool;
+	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
+	struct stack_record *stack;
+
+	if (parts.pool_index > pool_index_cached) {
+		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
+			parts.pool_index, pool_index_cached, handle);
+		return NULL;
+	}
+
+	pool = stack_pools[parts.pool_index];
+	if (!pool)
+		return NULL;
+
+	stack = pool + offset;
+	return stack;
+}
+
 /* Calculates the hash for a stack. */
 static inline u32 hash_stack(unsigned long *entries, unsigned int size)
 {
@@ -456,14 +483,6 @@ EXPORT_SYMBOL_GPL(stack_depot_save);
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries)
 {
-	union handle_parts parts = { .handle = handle };
-	/*
-	 * READ_ONCE pairs with potential concurrent write in
-	 * depot_alloc_stack.
-	 */
-	int pool_index_cached = READ_ONCE(pool_index);
-	void *pool;
-	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
 	*entries = NULL;
@@ -476,15 +495,7 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 	if (!handle || stack_depot_disabled)
 		return 0;
 
-	if (parts.pool_index > pool_index_cached) {
-		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-			parts.pool_index, pool_index_cached, handle);
-		return 0;
-	}
-	pool = stack_pools[parts.pool_index];
-	if (!pool)
-		return 0;
-	stack = pool + offset;
+	stack = depot_fetch_stack(handle);
 
 	*entries = stack->entries;
 	return stack->size;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/15] stackdepot: use fixed-sized slots for stack records
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (3 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 04/15] stackdepot: add depot_fetch_stack helper andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  8:21   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 06/15] stackdepot: fix and clean-up atomic annotations andrey.konovalov
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Instead of storing stack records in stack depot pools one right after
another, use 32-frame-sized slots.

This is preparatory patch for implementing the eviction of stack records
from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2128108f2acb..93191ee70fc3 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -42,6 +42,7 @@
 #define DEPOT_MAX_POOLS \
 	(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
 	 (1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP)
+#define DEPOT_STACK_MAX_FRAMES 32
 
 /* Compact structure that stores a reference to a stack. */
 union handle_parts {
@@ -58,9 +59,12 @@ struct stack_record {
 	u32 hash;			/* Hash in the hash table */
 	u32 size;			/* Number of stored frames */
 	union handle_parts handle;
-	unsigned long entries[];	/* Variable-sized array of frames */
+	unsigned long entries[DEPOT_STACK_MAX_FRAMES];	/* Frames */
 };
 
+#define DEPOT_STACK_RECORD_SIZE \
+		ALIGN(sizeof(struct stack_record), 1 << DEPOT_STACK_ALIGN)
+
 static bool stack_depot_disabled;
 static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
 static bool __stack_depot_early_init_passed __initdata;
@@ -258,9 +262,7 @@ static struct stack_record *
 depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 {
 	struct stack_record *stack;
-	size_t required_size = struct_size(stack, entries, size);
-
-	required_size = ALIGN(required_size, 1 << DEPOT_STACK_ALIGN);
+	size_t required_size = DEPOT_STACK_RECORD_SIZE;
 
 	/* Check if there is not enough space in the current pool. */
 	if (unlikely(pool_offset + required_size > DEPOT_POOL_SIZE)) {
@@ -295,6 +297,10 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	if (stack_pools[pool_index] == NULL)
 		return NULL;
 
+	/* Limit number of saved frames to DEPOT_STACK_MAX_FRAMES. */
+	if (size > DEPOT_STACK_MAX_FRAMES)
+		size = DEPOT_STACK_MAX_FRAMES;
+
 	/* Save the stack trace. */
 	stack = stack_pools[pool_index] + pool_offset;
 	stack->hash = hash;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/15] stackdepot: fix and clean-up atomic annotations
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (4 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 05/15] stackdepot: use fixed-sized slots for stack records andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  8:34   ` Marco Elver
  2023-08-29 17:11 ` [PATCH 07/15] stackdepot: rework helpers for depot_alloc_stack andrey.konovalov
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Simplify comments accompanying the use of atomic accesses in the
stack depot code.

Also turn smp_load_acquire from next_pool_required in depot_init_pool
into READ_ONCE, as both depot_init_pool and the all smp_store_release's
to this variable are executed under the stack depot lock.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>

---

This patch is not strictly required, as the atomic accesses are fully
removed in one of the latter patches. However, I decided to keep the
patch just in case we end up needing these atomics in the following
iterations of this series.
---
 lib/stackdepot.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 93191ee70fc3..9ae71e1ef1a7 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -226,10 +226,10 @@ static void depot_init_pool(void **prealloc)
 	/*
 	 * If the next pool is already initialized or the maximum number of
 	 * pools is reached, do not use the preallocated memory.
-	 * smp_load_acquire() here pairs with smp_store_release() below and
-	 * in depot_alloc_stack().
+	 * READ_ONCE is only used to mark the variable as atomic,
+	 * there are no concurrent writes.
 	 */
-	if (!smp_load_acquire(&next_pool_required))
+	if (!READ_ONCE(next_pool_required))
 		return;
 
 	/* Check if the current pool is not yet allocated. */
@@ -250,8 +250,8 @@ static void depot_init_pool(void **prealloc)
 		 * At this point, either the next pool is initialized or the
 		 * maximum number of pools is reached. In either case, take
 		 * note that initializing another pool is not required.
-		 * This smp_store_release pairs with smp_load_acquire() above
-		 * and in stack_depot_save().
+		 * smp_store_release pairs with smp_load_acquire in
+		 * stack_depot_save.
 		 */
 		smp_store_release(&next_pool_required, 0);
 	}
@@ -275,15 +275,15 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 		/*
 		 * Move on to the next pool.
 		 * WRITE_ONCE pairs with potential concurrent read in
-		 * stack_depot_fetch().
+		 * stack_depot_fetch.
 		 */
 		WRITE_ONCE(pool_index, pool_index + 1);
 		pool_offset = 0;
 		/*
 		 * If the maximum number of pools is not reached, take note
 		 * that the next pool needs to initialized.
-		 * smp_store_release() here pairs with smp_load_acquire() in
-		 * stack_depot_save() and depot_init_pool().
+		 * smp_store_release pairs with smp_load_acquire in
+		 * stack_depot_save.
 		 */
 		if (pool_index + 1 < DEPOT_MAX_POOLS)
 			smp_store_release(&next_pool_required, 1);
@@ -414,8 +414,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 
 	/*
 	 * Fast path: look the stack trace up without locking.
-	 * The smp_load_acquire() here pairs with smp_store_release() to
-	 * |bucket| below.
+	 * smp_load_acquire pairs with smp_store_release to |bucket| below.
 	 */
 	found = find_stack(smp_load_acquire(bucket), entries, nr_entries, hash);
 	if (found)
@@ -425,8 +424,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	 * Check if another stack pool needs to be initialized. If so, allocate
 	 * the memory now - we won't be able to do that under the lock.
 	 *
-	 * The smp_load_acquire() here pairs with smp_store_release() to
-	 * |next_pool_inited| in depot_alloc_stack() and depot_init_pool().
+	 * smp_load_acquire pairs with smp_store_release
+	 * in depot_alloc_stack and depot_init_pool.
 	 */
 	if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
 		/*
@@ -452,8 +451,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		if (new) {
 			new->next = *bucket;
 			/*
-			 * This smp_store_release() pairs with
-			 * smp_load_acquire() from |bucket| above.
+			 * smp_store_release pairs with smp_load_acquire
+			 * from |bucket| above.
 			 */
 			smp_store_release(bucket, new);
 			found = new;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/15] stackdepot: rework helpers for depot_alloc_stack
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (5 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 06/15] stackdepot: fix and clean-up atomic annotations andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-29 17:11 ` [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required andrey.konovalov
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Split code in depot_alloc_stack and depot_init_pool into 3 functions:

1. depot_keep_next_pool that keeps preallocated memory for the next pool
   if required.

2. depot_update_pools that moves on to the next pool if there's no space
   left in the current pool, uses preallocated memory for the new current
   pool if required, and calls depot_keep_next_pool otherwise.

3. depot_alloc_stack that calls depot_update_pools and then allocates
   a stack record as before.

This makes it somewhat easier to follow the logic of depot_alloc_stack
and also serves as a preparation for implementing the eviction of stack
records from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 85 +++++++++++++++++++++++++++---------------------
 1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 9ae71e1ef1a7..869d520bc690 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -220,11 +220,11 @@ int stack_depot_init(void)
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
-/* Uses preallocated memory to initialize a new stack depot pool. */
-static void depot_init_pool(void **prealloc)
+/* Keeps the preallocated memory to be used for the next stack depot pool. */
+static void depot_keep_next_pool(void **prealloc)
 {
 	/*
-	 * If the next pool is already initialized or the maximum number of
+	 * If the next pool is already saved or the maximum number of
 	 * pools is reached, do not use the preallocated memory.
 	 * READ_ONCE is only used to mark the variable as atomic,
 	 * there are no concurrent writes.
@@ -232,44 +232,33 @@ static void depot_init_pool(void **prealloc)
 	if (!READ_ONCE(next_pool_required))
 		return;
 
-	/* Check if the current pool is not yet allocated. */
-	if (stack_pools[pool_index] == NULL) {
-		/* Use the preallocated memory for the current pool. */
-		stack_pools[pool_index] = *prealloc;
+	/*
+	 * Use the preallocated memory for the next pool
+	 * as long as we do not exceed the maximum number of pools.
+	 */
+	if (pool_index + 1 < DEPOT_MAX_POOLS) {
+		stack_pools[pool_index + 1] = *prealloc;
 		*prealloc = NULL;
-	} else {
-		/*
-		 * Otherwise, use the preallocated memory for the next pool
-		 * as long as we do not exceed the maximum number of pools.
-		 */
-		if (pool_index + 1 < DEPOT_MAX_POOLS) {
-			stack_pools[pool_index + 1] = *prealloc;
-			*prealloc = NULL;
-		}
-		/*
-		 * At this point, either the next pool is initialized or the
-		 * maximum number of pools is reached. In either case, take
-		 * note that initializing another pool is not required.
-		 * smp_store_release pairs with smp_load_acquire in
-		 * stack_depot_save.
-		 */
-		smp_store_release(&next_pool_required, 0);
 	}
+
+	/*
+	 * At this point, either the next pool is kept or the maximum
+	 * number of pools is reached. In either case, take note that
+	 * keeping another pool is not required.
+	 * smp_store_release pairs with smp_load_acquire in stack_depot_save.
+	 */
+	smp_store_release(&next_pool_required, 0);
 }
 
-/* Allocates a new stack in a stack depot pool. */
-static struct stack_record *
-depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+/* Updates refences to the current and the next stack depot pools. */
+static bool depot_update_pools(size_t required_size, void **prealloc)
 {
-	struct stack_record *stack;
-	size_t required_size = DEPOT_STACK_RECORD_SIZE;
-
 	/* Check if there is not enough space in the current pool. */
 	if (unlikely(pool_offset + required_size > DEPOT_POOL_SIZE)) {
 		/* Bail out if we reached the pool limit. */
 		if (unlikely(pool_index + 1 >= DEPOT_MAX_POOLS)) {
 			WARN_ONCE(1, "Stack depot reached limit capacity");
-			return NULL;
+			return false;
 		}
 
 		/*
@@ -279,9 +268,10 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 		 */
 		WRITE_ONCE(pool_index, pool_index + 1);
 		pool_offset = 0;
+
 		/*
 		 * If the maximum number of pools is not reached, take note
-		 * that the next pool needs to initialized.
+		 * that the next pool needs to be initialized.
 		 * smp_store_release pairs with smp_load_acquire in
 		 * stack_depot_save.
 		 */
@@ -289,9 +279,30 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 			smp_store_release(&next_pool_required, 1);
 	}
 
-	/* Assign the preallocated memory to a pool if required. */
+	/* Check if the current pool is not yet allocated. */
+	if (*prealloc && stack_pools[pool_index] == NULL) {
+		/* Use the preallocated memory for the current pool. */
+		stack_pools[pool_index] = *prealloc;
+		*prealloc = NULL;
+		return true;
+	}
+
+	/* Otherwise, try using the preallocated memory for the next pool. */
 	if (*prealloc)
-		depot_init_pool(prealloc);
+		depot_keep_next_pool(prealloc);
+	return true;
+}
+
+/* Allocates a new stack in a stack depot pool. */
+static struct stack_record *
+depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+{
+	struct stack_record *stack;
+	size_t required_size = DEPOT_STACK_RECORD_SIZE;
+
+	/* Update current and next pools if required and possible. */
+	if (!depot_update_pools(required_size, prealloc))
+		return NULL;
 
 	/* Check if we have a pool to save the stack trace. */
 	if (stack_pools[pool_index] == NULL)
@@ -325,7 +336,7 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	union handle_parts parts = { .handle = handle };
 	/*
 	 * READ_ONCE pairs with potential concurrent write in
-	 * depot_alloc_stack.
+	 * depot_update_pools.
 	 */
 	int pool_index_cached = READ_ONCE(pool_index);
 	void *pool;
@@ -425,7 +436,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	 * the memory now - we won't be able to do that under the lock.
 	 *
 	 * smp_load_acquire pairs with smp_store_release
-	 * in depot_alloc_stack and depot_init_pool.
+	 * in depot_update_pools and depot_keep_next_pool.
 	 */
 	if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
 		/*
@@ -462,7 +473,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		 * Stack depot already contains this stack trace, but let's
 		 * keep the preallocated memory for the future.
 		 */
-		depot_init_pool(&prealloc);
+		depot_keep_next_pool(&prealloc);
 	}
 
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (6 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 07/15] stackdepot: rework helpers for depot_alloc_stack andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  8:29   ` Alexander Potapenko
  2023-08-29 17:11 ` [PATCH 09/15] stackdepot: store next pool pointer in new_pool andrey.konovalov
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Rename next_pool_required to new_pool_required.

This a purely code readability change: the following patch will change
stack depot to store the pointer to the new pool in a separate variable,
and "new" seems like a more logical name.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 47 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 869d520bc690..11934ea3b1c2 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -94,12 +94,11 @@ static size_t pool_offset;
 static DEFINE_RAW_SPINLOCK(pool_lock);
 /*
  * Stack depot tries to keep an extra pool allocated even before it runs out
- * of space in the currently used pool.
- * This flag marks that this next extra pool needs to be allocated and
- * initialized. It has the value 0 when either the next pool is not yet
- * initialized or the limit on the number of pools is reached.
+ * of space in the currently used pool. This flag marks whether this extra pool
+ * needs to be allocated. It has the value 0 when either an extra pool is not
+ * yet allocated or if the limit on the number of pools is reached.
  */
-static int next_pool_required = 1;
+static int new_pool_required = 1;
 
 static int __init disable_stack_depot(char *str)
 {
@@ -220,20 +219,20 @@ int stack_depot_init(void)
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
-/* Keeps the preallocated memory to be used for the next stack depot pool. */
-static void depot_keep_next_pool(void **prealloc)
+/* Keeps the preallocated memory to be used for a new stack depot pool. */
+static void depot_keep_new_pool(void **prealloc)
 {
 	/*
-	 * If the next pool is already saved or the maximum number of
+	 * If a new pool is already saved or the maximum number of
 	 * pools is reached, do not use the preallocated memory.
 	 * READ_ONCE is only used to mark the variable as atomic,
 	 * there are no concurrent writes.
 	 */
-	if (!READ_ONCE(next_pool_required))
+	if (!READ_ONCE(new_pool_required))
 		return;
 
 	/*
-	 * Use the preallocated memory for the next pool
+	 * Use the preallocated memory for the new pool
 	 * as long as we do not exceed the maximum number of pools.
 	 */
 	if (pool_index + 1 < DEPOT_MAX_POOLS) {
@@ -242,12 +241,12 @@ static void depot_keep_next_pool(void **prealloc)
 	}
 
 	/*
-	 * At this point, either the next pool is kept or the maximum
+	 * At this point, either a new pool is kept or the maximum
 	 * number of pools is reached. In either case, take note that
 	 * keeping another pool is not required.
 	 * smp_store_release pairs with smp_load_acquire in stack_depot_save.
 	 */
-	smp_store_release(&next_pool_required, 0);
+	smp_store_release(&new_pool_required, 0);
 }
 
 /* Updates refences to the current and the next stack depot pools. */
@@ -262,7 +261,7 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
 		}
 
 		/*
-		 * Move on to the next pool.
+		 * Move on to the new pool.
 		 * WRITE_ONCE pairs with potential concurrent read in
 		 * stack_depot_fetch.
 		 */
@@ -271,12 +270,12 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
 
 		/*
 		 * If the maximum number of pools is not reached, take note
-		 * that the next pool needs to be initialized.
+		 * that yet another new pool needs to be allocated.
 		 * smp_store_release pairs with smp_load_acquire in
 		 * stack_depot_save.
 		 */
 		if (pool_index + 1 < DEPOT_MAX_POOLS)
-			smp_store_release(&next_pool_required, 1);
+			smp_store_release(&new_pool_required, 1);
 	}
 
 	/* Check if the current pool is not yet allocated. */
@@ -287,9 +286,9 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
 		return true;
 	}
 
-	/* Otherwise, try using the preallocated memory for the next pool. */
+	/* Otherwise, try using the preallocated memory for a new pool. */
 	if (*prealloc)
-		depot_keep_next_pool(prealloc);
+		depot_keep_new_pool(prealloc);
 	return true;
 }
 
@@ -300,7 +299,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	struct stack_record *stack;
 	size_t required_size = DEPOT_STACK_RECORD_SIZE;
 
-	/* Update current and next pools if required and possible. */
+	/* Update current and new pools if required and possible. */
 	if (!depot_update_pools(required_size, prealloc))
 		return NULL;
 
@@ -432,13 +431,13 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		goto exit;
 
 	/*
-	 * Check if another stack pool needs to be initialized. If so, allocate
-	 * the memory now - we won't be able to do that under the lock.
+	 * Check if another stack pool needs to be allocated. If so, allocate
+	 * the memory now: we won't be able to do that under the lock.
 	 *
 	 * smp_load_acquire pairs with smp_store_release
-	 * in depot_update_pools and depot_keep_next_pool.
+	 * in depot_update_pools and depot_keep_new_pool.
 	 */
-	if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
+	if (unlikely(can_alloc && smp_load_acquire(&new_pool_required))) {
 		/*
 		 * Zero out zone modifiers, as we don't have specific zone
 		 * requirements. Keep the flags related to allocation in atomic
@@ -471,9 +470,9 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	} else if (prealloc) {
 		/*
 		 * Stack depot already contains this stack trace, but let's
-		 * keep the preallocated memory for the future.
+		 * keep the preallocated memory for future.
 		 */
-		depot_keep_next_pool(&prealloc);
+		depot_keep_new_pool(&prealloc);
 	}
 
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/15] stackdepot: store next pool pointer in new_pool
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (7 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-29 17:11 ` [PATCH 10/15] stackdepot: store free stack records in a freelist andrey.konovalov
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Instead of using the last pointer in stack_pools for storing the pointer
to a new pool (which does not yet store any stack records), use a new
new_pool variable.

This a purely code readability change: it seems more logical to store
the pointer to a pool with a special meaning in a dedicated variable.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 11934ea3b1c2..5982ea79939d 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -86,6 +86,8 @@ static unsigned int stack_hash_mask;
 
 /* Array of memory regions that store stack traces. */
 static void *stack_pools[DEPOT_MAX_POOLS];
+/* Newly allocated pool that is not yet added to stack_pools. */
+static void *new_pool;
 /* Currently used pool in stack_pools. */
 static int pool_index;
 /* Offset to the unused space in the currently used pool. */
@@ -236,7 +238,7 @@ static void depot_keep_new_pool(void **prealloc)
 	 * as long as we do not exceed the maximum number of pools.
 	 */
 	if (pool_index + 1 < DEPOT_MAX_POOLS) {
-		stack_pools[pool_index + 1] = *prealloc;
+		new_pool = *prealloc;
 		*prealloc = NULL;
 	}
 
@@ -266,6 +268,8 @@ static bool depot_update_pools(size_t required_size, void **prealloc)
 		 * stack_depot_fetch.
 		 */
 		WRITE_ONCE(pool_index, pool_index + 1);
+		stack_pools[pool_index] = new_pool;
+		new_pool = NULL;
 		pool_offset = 0;
 
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/15] stackdepot: store free stack records in a freelist
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (8 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 09/15] stackdepot: store next pool pointer in new_pool andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-29 17:11 ` [PATCH 11/15] stackdepot: use read/write lock andrey.konovalov
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Instead of using the global pool_offset variable to find a free slot
when storing a new stack record, mainlain a freelist of free slots
within the allocated stack pools.

A global next_stack variable is used as the head of the freelist, and
the next field in the stack_record struct is reused as freelist link
(when the record is not in the freelist, this field is used as a link
in the hash table).

This is preparatory patch for implementing the eviction of stack records
from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 130 +++++++++++++++++++++++++++++------------------
 1 file changed, 81 insertions(+), 49 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5982ea79939d..9011f4adcf20 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -55,8 +55,8 @@ union handle_parts {
 };
 
 struct stack_record {
-	struct stack_record *next;	/* Link in the hash table */
-	u32 hash;			/* Hash in the hash table */
+	struct stack_record *next;	/* Link in hash table or freelist */
+	u32 hash;			/* Hash in hash table */
 	u32 size;			/* Number of stored frames */
 	union handle_parts handle;
 	unsigned long entries[DEPOT_STACK_MAX_FRAMES];	/* Frames */
@@ -88,10 +88,10 @@ static unsigned int stack_hash_mask;
 static void *stack_pools[DEPOT_MAX_POOLS];
 /* Newly allocated pool that is not yet added to stack_pools. */
 static void *new_pool;
-/* Currently used pool in stack_pools. */
-static int pool_index;
-/* Offset to the unused space in the currently used pool. */
-static size_t pool_offset;
+/* Number of pools in stack_pools. */
+static int pools_num;
+/* Next stack in the freelist of stack records within stack_pools. */
+static struct stack_record *next_stack;
 /* Lock that protects the variables above. */
 static DEFINE_RAW_SPINLOCK(pool_lock);
 /*
@@ -221,6 +221,41 @@ int stack_depot_init(void)
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
+/* Initializes a stack depol pool. */
+static void depot_init_pool(void *pool)
+{
+	const int records_in_pool = DEPOT_POOL_SIZE / DEPOT_STACK_RECORD_SIZE;
+	int i, offset;
+
+	/* Initialize handles and link stack records to each other. */
+	for (i = 0, offset = 0; offset < DEPOT_POOL_SIZE;
+	     i++, offset += DEPOT_STACK_RECORD_SIZE) {
+		struct stack_record *stack = pool + offset;
+
+		stack->handle.pool_index = pools_num;
+		stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
+		stack->handle.extra = 0;
+
+		if (i < records_in_pool - 1)
+			stack->next = (void *)stack + DEPOT_STACK_RECORD_SIZE;
+		else
+			stack->next = NULL;
+	}
+
+	/* Link stack records into the freelist. */
+	WARN_ON(next_stack);
+	next_stack = pool;
+
+	/* Save reference to the pool to be used by depot_fetch_stack. */
+	stack_pools[pools_num] = pool;
+
+	/*
+	 * WRITE_ONCE pairs with potential concurrent read in
+	 * depot_fetch_stack.
+	 */
+	WRITE_ONCE(pools_num, pools_num + 1);
+}
+
 /* Keeps the preallocated memory to be used for a new stack depot pool. */
 static void depot_keep_new_pool(void **prealloc)
 {
@@ -237,7 +272,7 @@ static void depot_keep_new_pool(void **prealloc)
 	 * Use the preallocated memory for the new pool
 	 * as long as we do not exceed the maximum number of pools.
 	 */
-	if (pool_index + 1 < DEPOT_MAX_POOLS) {
+	if (pools_num < DEPOT_MAX_POOLS) {
 		new_pool = *prealloc;
 		*prealloc = NULL;
 	}
@@ -252,45 +287,42 @@ static void depot_keep_new_pool(void **prealloc)
 }
 
 /* Updates refences to the current and the next stack depot pools. */
-static bool depot_update_pools(size_t required_size, void **prealloc)
+static bool depot_update_pools(void **prealloc)
 {
-	/* Check if there is not enough space in the current pool. */
-	if (unlikely(pool_offset + required_size > DEPOT_POOL_SIZE)) {
-		/* Bail out if we reached the pool limit. */
-		if (unlikely(pool_index + 1 >= DEPOT_MAX_POOLS)) {
-			WARN_ONCE(1, "Stack depot reached limit capacity");
-			return false;
-		}
+	/* Check if we still have objects in the freelist. */
+	if (next_stack)
+		goto out_keep_prealloc;
 
-		/*
-		 * Move on to the new pool.
-		 * WRITE_ONCE pairs with potential concurrent read in
-		 * stack_depot_fetch.
-		 */
-		WRITE_ONCE(pool_index, pool_index + 1);
-		stack_pools[pool_index] = new_pool;
+	/* Check if we have a new pool saved and use it. */
+	if (new_pool) {
+		depot_init_pool(new_pool);
 		new_pool = NULL;
-		pool_offset = 0;
 
-		/*
-		 * If the maximum number of pools is not reached, take note
-		 * that yet another new pool needs to be allocated.
-		 * smp_store_release pairs with smp_load_acquire in
-		 * stack_depot_save.
-		 */
-		if (pool_index + 1 < DEPOT_MAX_POOLS)
+		/* Take note that we might need a new new_pool. */
+		if (pools_num < DEPOT_MAX_POOLS)
 			smp_store_release(&new_pool_required, 1);
+
+		/* Try keeping the preallocated memory for new_pool. */
+		goto out_keep_prealloc;
+	}
+
+	/* Bail out if we reached the pool limit. */
+	if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
+		WARN_ONCE(1, "Stack depot reached limit capacity");
+		return false;
 	}
 
-	/* Check if the current pool is not yet allocated. */
-	if (*prealloc && stack_pools[pool_index] == NULL) {
-		/* Use the preallocated memory for the current pool. */
-		stack_pools[pool_index] = *prealloc;
+	/* Check if we have preallocated memory and use it. */
+	if (*prealloc) {
+		depot_init_pool(*prealloc);
 		*prealloc = NULL;
 		return true;
 	}
 
-	/* Otherwise, try using the preallocated memory for a new pool. */
+	return false;
+
+out_keep_prealloc:
+	/* Keep the preallocated memory for a new pool if required. */
 	if (*prealloc)
 		depot_keep_new_pool(prealloc);
 	return true;
@@ -301,35 +333,35 @@ static struct stack_record *
 depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 {
 	struct stack_record *stack;
-	size_t required_size = DEPOT_STACK_RECORD_SIZE;
 
 	/* Update current and new pools if required and possible. */
-	if (!depot_update_pools(required_size, prealloc))
+	if (!depot_update_pools(prealloc))
 		return NULL;
 
-	/* Check if we have a pool to save the stack trace. */
-	if (stack_pools[pool_index] == NULL)
+	/* Check if we have a stack record to save the stack trace. */
+	stack = next_stack;
+	if (!stack)
 		return NULL;
 
+	/* Advance the freelist. */
+	next_stack = stack->next;
+
 	/* Limit number of saved frames to DEPOT_STACK_MAX_FRAMES. */
 	if (size > DEPOT_STACK_MAX_FRAMES)
 		size = DEPOT_STACK_MAX_FRAMES;
 
 	/* Save the stack trace. */
-	stack = stack_pools[pool_index] + pool_offset;
+	stack->next = NULL;
 	stack->hash = hash;
 	stack->size = size;
-	stack->handle.pool_index = pool_index;
-	stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
-	stack->handle.extra = 0;
+	/* stack->handle is already filled in by depot_init_pool. */
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
-	pool_offset += required_size;
 
 	/*
 	 * Let KMSAN know the stored stack record is initialized. This shall
 	 * prevent false positive reports if instrumented code accesses it.
 	 */
-	kmsan_unpoison_memory(stack, required_size);
+	kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE);
 
 	return stack;
 }
@@ -339,16 +371,16 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	union handle_parts parts = { .handle = handle };
 	/*
 	 * READ_ONCE pairs with potential concurrent write in
-	 * depot_update_pools.
+	 * depot_init_pool.
 	 */
-	int pool_index_cached = READ_ONCE(pool_index);
+	int pools_num_cached = READ_ONCE(pools_num);
 	void *pool;
 	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
-	if (parts.pool_index > pool_index_cached) {
+	if (parts.pool_index > pools_num_cached) {
 		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-			parts.pool_index, pool_index_cached, handle);
+			parts.pool_index, pools_num_cached, handle);
 		return NULL;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/15] stackdepot: use read/write lock
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (9 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 10/15] stackdepot: store free stack records in a freelist andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  9:13   ` Marco Elver
  2023-08-29 17:11 ` [PATCH 12/15] stackdepot: add refcount for records andrey.konovalov
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Currently, stack depot uses the following locking scheme:

1. Lock-free accesses when looking up a stack record, which allows to
   have multiple users to look up records in parallel;
2. Spinlock for protecting the stack depot pools and the hash table
   when adding a new record.

For implementing the eviction of stack traces from stack depot, the
lock-free approach is not going to work anymore, as we will need to be
able to also remove records from the hash table.

Convert the spinlock into a read/write lock, and drop the atomic accesses,
as they are no longer required.

Looking up stack traces is now protected by the read lock and adding new
records - by the write lock. One of the following patches will add a new
function for evicting stack records, which will be protected by the write
lock as well.

With this change, multiple users can still look up records in parallel.

This is preparatory patch for implementing the eviction of stack records
from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 76 ++++++++++++++++++++++--------------------------
 1 file changed, 35 insertions(+), 41 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 9011f4adcf20..5ad454367379 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -23,6 +23,7 @@
 #include <linux/percpu.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
 #include <linux/stacktrace.h>
 #include <linux/stackdepot.h>
 #include <linux/string.h>
@@ -92,15 +93,15 @@ static void *new_pool;
 static int pools_num;
 /* Next stack in the freelist of stack records within stack_pools. */
 static struct stack_record *next_stack;
-/* Lock that protects the variables above. */
-static DEFINE_RAW_SPINLOCK(pool_lock);
 /*
  * Stack depot tries to keep an extra pool allocated even before it runs out
  * of space in the currently used pool. This flag marks whether this extra pool
  * needs to be allocated. It has the value 0 when either an extra pool is not
  * yet allocated or if the limit on the number of pools is reached.
  */
-static int new_pool_required = 1;
+static bool new_pool_required = true;
+/* Lock that protects the variables above. */
+static DEFINE_RWLOCK(pool_rwlock);
 
 static int __init disable_stack_depot(char *str)
 {
@@ -248,12 +249,7 @@ static void depot_init_pool(void *pool)
 
 	/* Save reference to the pool to be used by depot_fetch_stack. */
 	stack_pools[pools_num] = pool;
-
-	/*
-	 * WRITE_ONCE pairs with potential concurrent read in
-	 * depot_fetch_stack.
-	 */
-	WRITE_ONCE(pools_num, pools_num + 1);
+	pools_num++;
 }
 
 /* Keeps the preallocated memory to be used for a new stack depot pool. */
@@ -262,10 +258,8 @@ static void depot_keep_new_pool(void **prealloc)
 	/*
 	 * If a new pool is already saved or the maximum number of
 	 * pools is reached, do not use the preallocated memory.
-	 * READ_ONCE is only used to mark the variable as atomic,
-	 * there are no concurrent writes.
 	 */
-	if (!READ_ONCE(new_pool_required))
+	if (!new_pool_required)
 		return;
 
 	/*
@@ -281,9 +275,8 @@ static void depot_keep_new_pool(void **prealloc)
 	 * At this point, either a new pool is kept or the maximum
 	 * number of pools is reached. In either case, take note that
 	 * keeping another pool is not required.
-	 * smp_store_release pairs with smp_load_acquire in stack_depot_save.
 	 */
-	smp_store_release(&new_pool_required, 0);
+	new_pool_required = false;
 }
 
 /* Updates refences to the current and the next stack depot pools. */
@@ -300,7 +293,7 @@ static bool depot_update_pools(void **prealloc)
 
 		/* Take note that we might need a new new_pool. */
 		if (pools_num < DEPOT_MAX_POOLS)
-			smp_store_release(&new_pool_required, 1);
+			new_pool_required = true;
 
 		/* Try keeping the preallocated memory for new_pool. */
 		goto out_keep_prealloc;
@@ -369,18 +362,13 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 {
 	union handle_parts parts = { .handle = handle };
-	/*
-	 * READ_ONCE pairs with potential concurrent write in
-	 * depot_init_pool.
-	 */
-	int pools_num_cached = READ_ONCE(pools_num);
 	void *pool;
 	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
-	if (parts.pool_index > pools_num_cached) {
+	if (parts.pool_index > pools_num) {
 		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-			parts.pool_index, pools_num_cached, handle);
+			parts.pool_index, pools_num, handle);
 		return NULL;
 	}
 
@@ -439,6 +427,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	depot_stack_handle_t handle = 0;
 	struct page *page = NULL;
 	void *prealloc = NULL;
+	bool need_alloc = false;
 	unsigned long flags;
 	u32 hash;
 
@@ -458,22 +447,26 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	hash = hash_stack(entries, nr_entries);
 	bucket = &stack_table[hash & stack_hash_mask];
 
-	/*
-	 * Fast path: look the stack trace up without locking.
-	 * smp_load_acquire pairs with smp_store_release to |bucket| below.
-	 */
-	found = find_stack(smp_load_acquire(bucket), entries, nr_entries, hash);
-	if (found)
+	read_lock_irqsave(&pool_rwlock, flags);
+
+	/* Fast path: look the stack trace up without full locking. */
+	found = find_stack(*bucket, entries, nr_entries, hash);
+	if (found) {
+		read_unlock_irqrestore(&pool_rwlock, flags);
 		goto exit;
+	}
+
+	/* Take note if another stack pool needs to be allocated. */
+	if (new_pool_required)
+		need_alloc = true;
+
+	read_unlock_irqrestore(&pool_rwlock, flags);
 
 	/*
-	 * Check if another stack pool needs to be allocated. If so, allocate
-	 * the memory now: we won't be able to do that under the lock.
-	 *
-	 * smp_load_acquire pairs with smp_store_release
-	 * in depot_update_pools and depot_keep_new_pool.
+	 * Allocate memory for a new pool if required now:
+	 * we won't be able to do that under the lock.
 	 */
-	if (unlikely(can_alloc && smp_load_acquire(&new_pool_required))) {
+	if (unlikely(can_alloc && need_alloc)) {
 		/*
 		 * Zero out zone modifiers, as we don't have specific zone
 		 * requirements. Keep the flags related to allocation in atomic
@@ -487,7 +480,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 			prealloc = page_address(page);
 	}
 
-	raw_spin_lock_irqsave(&pool_lock, flags);
+	write_lock_irqsave(&pool_rwlock, flags);
 
 	found = find_stack(*bucket, entries, nr_entries, hash);
 	if (!found) {
@@ -496,11 +489,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 
 		if (new) {
 			new->next = *bucket;
-			/*
-			 * smp_store_release pairs with smp_load_acquire
-			 * from |bucket| above.
-			 */
-			smp_store_release(bucket, new);
+			*bucket = new;
 			found = new;
 		}
 	} else if (prealloc) {
@@ -511,7 +500,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		depot_keep_new_pool(&prealloc);
 	}
 
-	raw_spin_unlock_irqrestore(&pool_lock, flags);
+	write_unlock_irqrestore(&pool_rwlock, flags);
 exit:
 	if (prealloc) {
 		/* Stack depot didn't use this memory, free it. */
@@ -535,6 +524,7 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries)
 {
 	struct stack_record *stack;
+	unsigned long flags;
 
 	*entries = NULL;
 	/*
@@ -546,8 +536,12 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 	if (!handle || stack_depot_disabled)
 		return 0;
 
+	read_lock_irqsave(&pool_rwlock, flags);
+
 	stack = depot_fetch_stack(handle);
 
+	read_unlock_irqrestore(&pool_rwlock, flags);
+
 	*entries = stack->entries;
 	return stack->size;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/15] stackdepot: add refcount for records
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (10 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 11/15] stackdepot: use read/write lock andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  9:32   ` Marco Elver
  2023-09-01 13:06   ` Kuan-Ying Lee (李冠穎)
  2023-08-29 17:11 ` [PATCH 13/15] stackdepot: add backwards links to hash table buckets andrey.konovalov
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Add a reference counter for how many times a stack records has been added
to stack depot.

Do no yet decrement the refcount, this is implemented in one of the
following patches.

This is preparatory patch for implementing the eviction of stack records
from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5ad454367379..a84c0debbb9e 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -22,6 +22,7 @@
 #include <linux/mutex.h>
 #include <linux/percpu.h>
 #include <linux/printk.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/stacktrace.h>
@@ -60,6 +61,7 @@ struct stack_record {
 	u32 hash;			/* Hash in hash table */
 	u32 size;			/* Number of stored frames */
 	union handle_parts handle;
+	refcount_t count;
 	unsigned long entries[DEPOT_STACK_MAX_FRAMES];	/* Frames */
 };
 
@@ -348,6 +350,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->hash = hash;
 	stack->size = size;
 	/* stack->handle is already filled in by depot_init_pool. */
+	refcount_set(&stack->count, 1);
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 
 	/*
@@ -452,6 +455,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 	/* Fast path: look the stack trace up without full locking. */
 	found = find_stack(*bucket, entries, nr_entries, hash);
 	if (found) {
+		refcount_inc(&found->count);
 		read_unlock_irqrestore(&pool_rwlock, flags);
 		goto exit;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/15] stackdepot: add backwards links to hash table buckets
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (11 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 12/15] stackdepot: add refcount for records andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  9:24   ` Marco Elver
  2023-08-29 17:11 ` [PATCH 14/15] stackdepot: allow users to evict stack traces andrey.konovalov
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Maintain links in the stack records to previous entries within the
hash table buckets.

This is preparatory patch for implementing the eviction of stack records
from the stack depot.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 lib/stackdepot.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index a84c0debbb9e..641db97d8c7c 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -58,6 +58,7 @@ union handle_parts {
 
 struct stack_record {
 	struct stack_record *next;	/* Link in hash table or freelist */
+	struct stack_record *prev;	/* Link in hash table */
 	u32 hash;			/* Hash in hash table */
 	u32 size;			/* Number of stored frames */
 	union handle_parts handle;
@@ -493,6 +494,9 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 
 		if (new) {
 			new->next = *bucket;
+			new->prev = NULL;
+			if (*bucket)
+				(*bucket)->prev = new;
 			*bucket = new;
 			found = new;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 14/15] stackdepot: allow users to evict stack traces
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (12 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 13/15] stackdepot: add backwards links to hash table buckets andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  9:20   ` Marco Elver
  2023-08-29 17:11 ` [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes andrey.konovalov
  2023-08-30  7:46 ` [PATCH 00/15] stackdepot: allow evicting stack traces Vlastimil Babka
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Add stack_depot_evict, a function that decrements a reference counter
on a stack record and removes it from the stack depot once the counter
reaches 0.

Internally, when removing a stack record, the function unlinks it from
the hash table bucket and returns to the freelist.

With this change, the users of stack depot can call stack_depot_evict
when keeping a stack trace in the stack depot is not needed anymore.
This allows avoiding polluting the stack depot with irrelevant stack
traces and thus have more space to store the relevant ones before the
stack depot reaches its capacity.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 include/linux/stackdepot.h | 11 ++++++++++
 lib/stackdepot.c           | 43 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index e58306783d8e..b14da6797714 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -121,6 +121,17 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries);
 
+/**
+ * stack_depot_evict - Drop a reference to a stack trace from stack depot
+ *
+ * @handle:	Stack depot handle returned from stack_depot_save()
+ *
+ * The stack trace gets fully removed from stack depot once all references
+ * to it has been dropped (once the number of stack_depot_evict calls matches
+ * the number of stack_depot_save calls for this stack trace).
+ */
+void stack_depot_evict(depot_stack_handle_t handle);
+
 /**
  * stack_depot_print - Print a stack trace from stack depot
  *
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 641db97d8c7c..cf28720b842d 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -384,6 +384,13 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	return stack;
 }
 
+/* Frees stack into the freelist. */
+static void depot_free_stack(struct stack_record *stack)
+{
+	stack->next = next_stack;
+	next_stack = stack;
+}
+
 /* Calculates the hash for a stack. */
 static inline u32 hash_stack(unsigned long *entries, unsigned int size)
 {
@@ -555,6 +562,42 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
+void stack_depot_evict(depot_stack_handle_t handle)
+{
+	struct stack_record *stack, **bucket;
+	unsigned long flags;
+
+	if (!handle || stack_depot_disabled)
+		return;
+
+	write_lock_irqsave(&pool_rwlock, flags);
+
+	stack = depot_fetch_stack(handle);
+	if (WARN_ON(!stack))
+		goto out;
+
+	if (refcount_dec_and_test(&stack->count)) {
+		/* Drop stack from the hash table. */
+		if (stack->next)
+			stack->next->prev = stack->prev;
+		if (stack->prev)
+			stack->prev->next = stack->next;
+		else {
+			bucket = &stack_table[stack->hash & stack_hash_mask];
+			*bucket = stack->next;
+		}
+		stack->next = NULL;
+		stack->prev = NULL;
+
+		/* Free stack. */
+		depot_free_stack(stack);
+	}
+
+out:
+	write_unlock_irqrestore(&pool_rwlock, flags);
+}
+EXPORT_SYMBOL_GPL(stack_depot_evict);
+
 void stack_depot_print(depot_stack_handle_t stack)
 {
 	unsigned long *entries;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (13 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 14/15] stackdepot: allow users to evict stack traces andrey.konovalov
@ 2023-08-29 17:11 ` andrey.konovalov
  2023-08-30  9:38   ` Marco Elver
  2023-08-30  7:46 ` [PATCH 00/15] stackdepot: allow evicting stack traces Vlastimil Babka
  15 siblings, 1 reply; 50+ messages in thread
From: andrey.konovalov @ 2023-08-29 17:11 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka, kasan-dev,
	Evgenii Stepanov, Andrew Morton, linux-mm, linux-kernel,
	Andrey Konovalov

From: Andrey Konovalov <andreyknvl@google.com>

Evict stack traces from the stack depot for the tag-based KASAN modes
once they are evicted from the stack ring.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
 mm/kasan/tags.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 7dcfe341d48e..fa6b0f77a7dd 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -96,7 +96,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
 			gfp_t gfp_flags, bool is_free)
 {
 	unsigned long flags;
-	depot_stack_handle_t stack;
+	depot_stack_handle_t stack, old_stack;
 	u64 pos;
 	struct kasan_stack_ring_entry *entry;
 	void *old_ptr;
@@ -120,6 +120,8 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
 	if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
 		goto next; /* Busy slot. */
 
+	old_stack = READ_ONCE(entry->stack);
+
 	WRITE_ONCE(entry->size, cache->object_size);
 	WRITE_ONCE(entry->pid, current->pid);
 	WRITE_ONCE(entry->stack, stack);
@@ -131,6 +133,9 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
 	smp_store_release(&entry->ptr, (s64)object);
 
 	read_unlock_irqrestore(&stack_ring.lock, flags);
+
+	if (old_stack)
+		stack_depot_evict(old_stack);
 }
 
 void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/15] stackdepot: check disabled flag when fetching
  2023-08-29 17:11 ` [PATCH 01/15] stackdepot: check disabled flag when fetching andrey.konovalov
@ 2023-08-30  7:40   ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  7:40 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Do not try fetching a stack trace from the stack depot if the
> stack_depot_disabled flag is enabled.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/15] stackdepot: simplify __stack_depot_save
  2023-08-29 17:11 ` [PATCH 02/15] stackdepot: simplify __stack_depot_save andrey.konovalov
@ 2023-08-30  7:41   ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  7:41 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> The retval local variable in __stack_depot_save has the union type
> handle_parts, but the function never uses anything but the union's
> handle field.
>
> Define retval simply as depot_stack_handle_t to simplify the code.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/15] stackdepot: drop valid bit from handles
  2023-08-29 17:11 ` [PATCH 03/15] stackdepot: drop valid bit from handles andrey.konovalov
@ 2023-08-30  7:43   ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  7:43 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Stack depot doesn't use the valid bit in handles in any way, so drop it.

Good catch!

> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/15] stackdepot: allow evicting stack traces
  2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
                   ` (14 preceding siblings ...)
  2023-08-29 17:11 ` [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes andrey.konovalov
@ 2023-08-30  7:46 ` Vlastimil Babka
  2023-09-04 18:45   ` Andrey Konovalov
  15 siblings, 1 reply; 50+ messages in thread
From: Vlastimil Babka @ 2023-08-30  7:46 UTC (permalink / raw)
  To: andrey.konovalov, Marco Elver, Alexander Potapenko
  Cc: Andrey Konovalov, Dmitry Vyukov, kasan-dev, Evgenii Stepanov,
	Andrew Morton, linux-mm, linux-kernel, Andrey Konovalov

On 8/29/23 19:11, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Currently, the stack depot grows indefinitely until it reaches its
> capacity. Once that happens, the stack depot stops saving new stack
> traces.
> 
> This creates a problem for using the stack depot for in-field testing
> and in production.
> 
> For such uses, an ideal stack trace storage should:
> 
> 1. Allow saving fresh stack traces on systems with a large uptime while
>    limiting the amount of memory used to store the traces;
> 2. Have a low performance impact.

I wonder if there's also another thing to consider for the future:

3. With the number of stackdepot users increasing, each having their
distinct set of stacks from others, would it make sense to create separate
"storage instance" for each user instead of putting everything in a single
shared one?

In any case, evicting support is a good development, thanks!

> Implementing #1 in the stack depot is impossible with the current
> keep-forever approach. This series targets to address that. Issue #2 is
> left to be addressed in a future series.
> 
> This series changes the stack depot implementation to allow evicting
> unneeded stack traces from the stack depot. The users of the stack depot
> can do that via a new stack_depot_evict API.
> 
> Internal changes to the stack depot code include:
> 
> 1. Storing stack traces in 32-frame-sized slots (vs precisely-sized slots
>    in the current implementation);
> 2. Keeping available slots in a freelist (vs keeping an offset to the next
>    free slot);
> 3. Using a read/write lock for synchronization (vs a lock-free approach
>    combined with a spinlock).
> 
> This series also integrates the eviction functionality in the tag-based
> KASAN modes. (I will investigate integrating it into the Generic mode as
> well in the following iterations of this series.)
> 
> Despite wasting some space on rounding up the size of each stack record
> to 32 frames, with this change, the tag-based KASAN modes end up
> consuming ~5% less memory in stack depot during boot (with the default
> stack ring size of 32k entries). The reason for this is the eviction of
> irrelevant stack traces from the stack depot, which frees up space for
> other stack traces.
> 
> For other tools that heavily rely on the stack depot, like Generic KASAN
> and KMSAN, this change leads to the stack depot capacity being reached
> sooner than before. However, as these tools are mainly used in fuzzing
> scenarios where the kernel is frequently rebooted, this outcome should
> be acceptable.
> 
> There is no measurable boot time performace impact of these changes for
> KASAN on x86-64. I haven't done any tests for arm64 modes (the stack
> depot without performance optimizations is not suitable for intended use
> of those anyway), but I expect a similar result. Obtaining and copying
> stack trace frames when saving them into stack depot is what takes the
> most time.
> 
> This series does not yet provide a way to configure the maximum size of
> the stack depot externally (e.g. via a command-line parameter). This will
> either be added in the following iterations of this series (if the used
> approach gets approval) or will be added together with the performance
> improvement changes.
> 
> Andrey Konovalov (15):
>   stackdepot: check disabled flag when fetching
>   stackdepot: simplify __stack_depot_save
>   stackdepot: drop valid bit from handles
>   stackdepot: add depot_fetch_stack helper
>   stackdepot: use fixed-sized slots for stack records
>   stackdepot: fix and clean-up atomic annotations
>   stackdepot: rework helpers for depot_alloc_stack
>   stackdepot: rename next_pool_required to new_pool_required
>   stackdepot: store next pool pointer in new_pool
>   stackdepot: store free stack records in a freelist
>   stackdepot: use read/write lock
>   stackdepot: add refcount for records
>   stackdepot: add backwards links to hash table buckets
>   stackdepot: allow users to evict stack traces
>   kasan: use stack_depot_evict for tag-based modes
> 
>  include/linux/stackdepot.h |  11 ++
>  lib/stackdepot.c           | 361 ++++++++++++++++++++++++-------------
>  mm/kasan/tags.c            |   7 +-
>  3 files changed, 249 insertions(+), 130 deletions(-)
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/15] stackdepot: add depot_fetch_stack helper
  2023-08-29 17:11 ` [PATCH 04/15] stackdepot: add depot_fetch_stack helper andrey.konovalov
@ 2023-08-30  7:47   ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  7:47 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Add a helper depot_fetch_stack function that fetches the pointer to
> a stack record.
>
> With this change, all static depot_* functions now operate on stack pools
> and the exported stack_depot_* functions operate on the hash table.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

(one nit below)


> +static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
> +{
> +       union handle_parts parts = { .handle = handle };
> +       /*
> +        * READ_ONCE pairs with potential concurrent write in
> +        * depot_alloc_stack.
Nit: please change to "depot_alloc_stack()" for consistency with the
rest of the comments.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/15] stackdepot: use fixed-sized slots for stack records
  2023-08-29 17:11 ` [PATCH 05/15] stackdepot: use fixed-sized slots for stack records andrey.konovalov
@ 2023-08-30  8:21   ` Alexander Potapenko
  2023-09-13 17:07     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  8:21 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Instead of storing stack records in stack depot pools one right after
> another, use 32-frame-sized slots.

I am slightly concerned about the KMSAN use case here, which defines
KMSAN_STACK_DEPTH to 64.
I don't have a comprehensive stack depth breakdown, but a quick poking
around syzkaller.appspot.com shows several cases where the stacks are
actually longer than 32 frames.
Can you add a config parameter for the stack depth instead of
mandating 32 frames everywhere?

As a side note, kmsan_internal_chain_origin()
(https://elixir.bootlin.com/linux/latest/source/mm/kmsan/core.c#L214)
creates small 3-frame records in the stack depot to link two stacks
together, which will add unnecessary stackdepot pressure.
But this can be fixed by storing both the new stack trace and the link
to the old stack trace in the same record.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required
  2023-08-29 17:11 ` [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required andrey.konovalov
@ 2023-08-30  8:29   ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-08-30  8:29 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Marco Elver, Andrey Konovalov, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 7:12 PM <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> Rename next_pool_required to new_pool_required.
>
> This a purely code readability change: the following patch will change
> stack depot to store the pointer to the new pool in a separate variable,
> and "new" seems like a more logical name.
>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/15] stackdepot: fix and clean-up atomic annotations
  2023-08-29 17:11 ` [PATCH 06/15] stackdepot: fix and clean-up atomic annotations andrey.konovalov
@ 2023-08-30  8:34   ` Marco Elver
  2023-09-04 18:45     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  8:34 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Simplify comments accompanying the use of atomic accesses in the
> stack depot code.
> 
> Also turn smp_load_acquire from next_pool_required in depot_init_pool
> into READ_ONCE, as both depot_init_pool and the all smp_store_release's
> to this variable are executed under the stack depot lock.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> 
> ---
> 
> This patch is not strictly required, as the atomic accesses are fully
> removed in one of the latter patches. However, I decided to keep the
> patch just in case we end up needing these atomics in the following
> iterations of this series.
> ---
>  lib/stackdepot.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 93191ee70fc3..9ae71e1ef1a7 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -226,10 +226,10 @@ static void depot_init_pool(void **prealloc)
>  	/*
>  	 * If the next pool is already initialized or the maximum number of
>  	 * pools is reached, do not use the preallocated memory.
> -	 * smp_load_acquire() here pairs with smp_store_release() below and
> -	 * in depot_alloc_stack().
> +	 * READ_ONCE is only used to mark the variable as atomic,
> +	 * there are no concurrent writes.

This doesn't make sense. If there are no concurrent writes, we should
drop the marking, so that if there are concurrent writes, tools like
KCSAN can tell us about it if our assumption was wrong.

>  	 */
> -	if (!smp_load_acquire(&next_pool_required))
> +	if (!READ_ONCE(next_pool_required))
>  		return;
>  
>  	/* Check if the current pool is not yet allocated. */
> @@ -250,8 +250,8 @@ static void depot_init_pool(void **prealloc)
>  		 * At this point, either the next pool is initialized or the
>  		 * maximum number of pools is reached. In either case, take
>  		 * note that initializing another pool is not required.
> -		 * This smp_store_release pairs with smp_load_acquire() above
> -		 * and in stack_depot_save().
> +		 * smp_store_release pairs with smp_load_acquire in
> +		 * stack_depot_save.
>  		 */
>  		smp_store_release(&next_pool_required, 0);
>  	}
> @@ -275,15 +275,15 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>  		/*
>  		 * Move on to the next pool.
>  		 * WRITE_ONCE pairs with potential concurrent read in
> -		 * stack_depot_fetch().
> +		 * stack_depot_fetch.
>  		 */
>  		WRITE_ONCE(pool_index, pool_index + 1);
>  		pool_offset = 0;
>  		/*
>  		 * If the maximum number of pools is not reached, take note
>  		 * that the next pool needs to initialized.
> -		 * smp_store_release() here pairs with smp_load_acquire() in
> -		 * stack_depot_save() and depot_init_pool().
> +		 * smp_store_release pairs with smp_load_acquire in
> +		 * stack_depot_save.
>  		 */
>  		if (pool_index + 1 < DEPOT_MAX_POOLS)
>  			smp_store_release(&next_pool_required, 1);
> @@ -414,8 +414,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  
>  	/*
>  	 * Fast path: look the stack trace up without locking.
> -	 * The smp_load_acquire() here pairs with smp_store_release() to
> -	 * |bucket| below.
> +	 * smp_load_acquire pairs with smp_store_release to |bucket| below.
>  	 */
>  	found = find_stack(smp_load_acquire(bucket), entries, nr_entries, hash);
>  	if (found)
> @@ -425,8 +424,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  	 * Check if another stack pool needs to be initialized. If so, allocate
>  	 * the memory now - we won't be able to do that under the lock.
>  	 *
> -	 * The smp_load_acquire() here pairs with smp_store_release() to
> -	 * |next_pool_inited| in depot_alloc_stack() and depot_init_pool().
> +	 * smp_load_acquire pairs with smp_store_release
> +	 * in depot_alloc_stack and depot_init_pool.

Reflow comment to match 80 cols used by comments elsewhere.

>  	 */
>  	if (unlikely(can_alloc && smp_load_acquire(&next_pool_required))) {
>  		/*
> @@ -452,8 +451,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  		if (new) {
>  			new->next = *bucket;
>  			/*
> -			 * This smp_store_release() pairs with
> -			 * smp_load_acquire() from |bucket| above.
> +			 * smp_store_release pairs with smp_load_acquire
> +			 * from |bucket| above.
>  			 */
>  			smp_store_release(bucket, new);
>  			found = new;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2023-08-29 17:11 ` [PATCH 11/15] stackdepot: use read/write lock andrey.konovalov
@ 2023-08-30  9:13   ` Marco Elver
  2023-09-04 18:46     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  9:13 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Currently, stack depot uses the following locking scheme:
> 
> 1. Lock-free accesses when looking up a stack record, which allows to
>    have multiple users to look up records in parallel;
> 2. Spinlock for protecting the stack depot pools and the hash table
>    when adding a new record.
> 
> For implementing the eviction of stack traces from stack depot, the
> lock-free approach is not going to work anymore, as we will need to be
> able to also remove records from the hash table.
> 
> Convert the spinlock into a read/write lock, and drop the atomic accesses,
> as they are no longer required.
> 
> Looking up stack traces is now protected by the read lock and adding new
> records - by the write lock. One of the following patches will add a new
> function for evicting stack records, which will be protected by the write
> lock as well.
> 
> With this change, multiple users can still look up records in parallel.
> 
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  lib/stackdepot.c | 76 ++++++++++++++++++++++--------------------------
>  1 file changed, 35 insertions(+), 41 deletions(-)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 9011f4adcf20..5ad454367379 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -23,6 +23,7 @@
>  #include <linux/percpu.h>
>  #include <linux/printk.h>
>  #include <linux/slab.h>
> +#include <linux/spinlock.h>
>  #include <linux/stacktrace.h>
>  #include <linux/stackdepot.h>
>  #include <linux/string.h>
> @@ -92,15 +93,15 @@ static void *new_pool;
>  static int pools_num;
>  /* Next stack in the freelist of stack records within stack_pools. */
>  static struct stack_record *next_stack;
> -/* Lock that protects the variables above. */
> -static DEFINE_RAW_SPINLOCK(pool_lock);
>  /*
>   * Stack depot tries to keep an extra pool allocated even before it runs out
>   * of space in the currently used pool. This flag marks whether this extra pool
>   * needs to be allocated. It has the value 0 when either an extra pool is not
>   * yet allocated or if the limit on the number of pools is reached.
>   */
> -static int new_pool_required = 1;
> +static bool new_pool_required = true;
> +/* Lock that protects the variables above. */
> +static DEFINE_RWLOCK(pool_rwlock);

Despite this being a rwlock, it'll introduce tons of (cache) contention
for the common case (stack depot entry exists).

If creating new stack depot entries is only common during "warm-up" and
then becomes exceedingly rare, I think a percpu-rwsem (read-lock is a
CPU-local access, but write-locking is expensive) may be preferable.

>  static int __init disable_stack_depot(char *str)
>  {
> @@ -248,12 +249,7 @@ static void depot_init_pool(void *pool)
>  
>  	/* Save reference to the pool to be used by depot_fetch_stack. */
>  	stack_pools[pools_num] = pool;
> -
> -	/*
> -	 * WRITE_ONCE pairs with potential concurrent read in
> -	 * depot_fetch_stack.
> -	 */
> -	WRITE_ONCE(pools_num, pools_num + 1);
> +	pools_num++;
>  }
>  
>  /* Keeps the preallocated memory to be used for a new stack depot pool. */
> @@ -262,10 +258,8 @@ static void depot_keep_new_pool(void **prealloc)
>  	/*
>  	 * If a new pool is already saved or the maximum number of
>  	 * pools is reached, do not use the preallocated memory.
> -	 * READ_ONCE is only used to mark the variable as atomic,
> -	 * there are no concurrent writes.
>  	 */
> -	if (!READ_ONCE(new_pool_required))
> +	if (!new_pool_required)

In my comment for the other patch I already suggested this change. Maybe
move it there.

>  		return;
>  
>  	/*
> @@ -281,9 +275,8 @@ static void depot_keep_new_pool(void **prealloc)
>  	 * At this point, either a new pool is kept or the maximum
>  	 * number of pools is reached. In either case, take note that
>  	 * keeping another pool is not required.
> -	 * smp_store_release pairs with smp_load_acquire in stack_depot_save.
>  	 */
> -	smp_store_release(&new_pool_required, 0);
> +	new_pool_required = false;
>  }
>  
>  /* Updates refences to the current and the next stack depot pools. */
> @@ -300,7 +293,7 @@ static bool depot_update_pools(void **prealloc)
>  
>  		/* Take note that we might need a new new_pool. */
>  		if (pools_num < DEPOT_MAX_POOLS)
> -			smp_store_release(&new_pool_required, 1);
> +			new_pool_required = true;
>  
>  		/* Try keeping the preallocated memory for new_pool. */
>  		goto out_keep_prealloc;
> @@ -369,18 +362,13 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>  static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
>  {
>  	union handle_parts parts = { .handle = handle };
> -	/*
> -	 * READ_ONCE pairs with potential concurrent write in
> -	 * depot_init_pool.
> -	 */
> -	int pools_num_cached = READ_ONCE(pools_num);
>  	void *pool;
>  	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
>  	struct stack_record *stack;

I'd add lockdep assertions to check that the lock is held appropriately
when entering various helper functions that don't actually take the
lock. Similarly for places that should not have the lock held you could
assert the lock is not held.

> -	if (parts.pool_index > pools_num_cached) {
> +	if (parts.pool_index > pools_num) {
>  		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
> -			parts.pool_index, pools_num_cached, handle);
> +			parts.pool_index, pools_num, handle);
>  		return NULL;
>  	}
>  
> @@ -439,6 +427,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  	depot_stack_handle_t handle = 0;
>  	struct page *page = NULL;
>  	void *prealloc = NULL;
> +	bool need_alloc = false;
>  	unsigned long flags;
>  	u32 hash;
>  
> @@ -458,22 +447,26 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  	hash = hash_stack(entries, nr_entries);
>  	bucket = &stack_table[hash & stack_hash_mask];
>  
> -	/*
> -	 * Fast path: look the stack trace up without locking.
> -	 * smp_load_acquire pairs with smp_store_release to |bucket| below.
> -	 */
> -	found = find_stack(smp_load_acquire(bucket), entries, nr_entries, hash);
> -	if (found)
> +	read_lock_irqsave(&pool_rwlock, flags);
> +
> +	/* Fast path: look the stack trace up without full locking. */
> +	found = find_stack(*bucket, entries, nr_entries, hash);
> +	if (found) {
> +		read_unlock_irqrestore(&pool_rwlock, flags);
>  		goto exit;
> +	}
> +
> +	/* Take note if another stack pool needs to be allocated. */
> +	if (new_pool_required)
> +		need_alloc = true;
> +
> +	read_unlock_irqrestore(&pool_rwlock, flags);
>  
>  	/*
> -	 * Check if another stack pool needs to be allocated. If so, allocate
> -	 * the memory now: we won't be able to do that under the lock.
> -	 *
> -	 * smp_load_acquire pairs with smp_store_release
> -	 * in depot_update_pools and depot_keep_new_pool.
> +	 * Allocate memory for a new pool if required now:
> +	 * we won't be able to do that under the lock.
>  	 */
> -	if (unlikely(can_alloc && smp_load_acquire(&new_pool_required))) {
> +	if (unlikely(can_alloc && need_alloc)) {
>  		/*
>  		 * Zero out zone modifiers, as we don't have specific zone
>  		 * requirements. Keep the flags related to allocation in atomic
> @@ -487,7 +480,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  			prealloc = page_address(page);
>  	}
>  
> -	raw_spin_lock_irqsave(&pool_lock, flags);
> +	write_lock_irqsave(&pool_rwlock, flags);
>  
>  	found = find_stack(*bucket, entries, nr_entries, hash);
>  	if (!found) {
> @@ -496,11 +489,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  
>  		if (new) {
>  			new->next = *bucket;
> -			/*
> -			 * smp_store_release pairs with smp_load_acquire
> -			 * from |bucket| above.
> -			 */
> -			smp_store_release(bucket, new);
> +			*bucket = new;
>  			found = new;
>  		}
>  	} else if (prealloc) {
> @@ -511,7 +500,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  		depot_keep_new_pool(&prealloc);
>  	}
>  
> -	raw_spin_unlock_irqrestore(&pool_lock, flags);
> +	write_unlock_irqrestore(&pool_rwlock, flags);
>  exit:
>  	if (prealloc) {
>  		/* Stack depot didn't use this memory, free it. */
> @@ -535,6 +524,7 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>  			       unsigned long **entries)
>  {
>  	struct stack_record *stack;
> +	unsigned long flags;
>  
>  	*entries = NULL;
>  	/*
> @@ -546,8 +536,12 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>  	if (!handle || stack_depot_disabled)
>  		return 0;
>  
> +	read_lock_irqsave(&pool_rwlock, flags);
> +
>  	stack = depot_fetch_stack(handle);
>  
> +	read_unlock_irqrestore(&pool_rwlock, flags);
> +
>  	*entries = stack->entries;
>  	return stack->size;
>  }
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 14/15] stackdepot: allow users to evict stack traces
  2023-08-29 17:11 ` [PATCH 14/15] stackdepot: allow users to evict stack traces andrey.konovalov
@ 2023-08-30  9:20   ` Marco Elver
  2023-09-04 18:47     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  9:20 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add stack_depot_evict, a function that decrements a reference counter
> on a stack record and removes it from the stack depot once the counter
> reaches 0.
> 
> Internally, when removing a stack record, the function unlinks it from
> the hash table bucket and returns to the freelist.
> 
> With this change, the users of stack depot can call stack_depot_evict
> when keeping a stack trace in the stack depot is not needed anymore.
> This allows avoiding polluting the stack depot with irrelevant stack
> traces and thus have more space to store the relevant ones before the
> stack depot reaches its capacity.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  include/linux/stackdepot.h | 11 ++++++++++
>  lib/stackdepot.c           | 43 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index e58306783d8e..b14da6797714 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -121,6 +121,17 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>  			       unsigned long **entries);
>  
> +/**
> + * stack_depot_evict - Drop a reference to a stack trace from stack depot
> + *
> + * @handle:	Stack depot handle returned from stack_depot_save()
> + *
> + * The stack trace gets fully removed from stack depot once all references

"gets fully removed" -> "is evicted" ?

> + * to it has been dropped (once the number of stack_depot_evict calls matches

"has been" -> "have been"

> + * the number of stack_depot_save calls for this stack trace).
> + */
> +void stack_depot_evict(depot_stack_handle_t handle);
> +
>  /**
>   * stack_depot_print - Print a stack trace from stack depot
>   *
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 641db97d8c7c..cf28720b842d 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -384,6 +384,13 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
>  	return stack;
>  }
>  
> +/* Frees stack into the freelist. */
> +static void depot_free_stack(struct stack_record *stack)
> +{
> +	stack->next = next_stack;
> +	next_stack = stack;
> +}
> +
>  /* Calculates the hash for a stack. */
>  static inline u32 hash_stack(unsigned long *entries, unsigned int size)
>  {
> @@ -555,6 +562,42 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_fetch);
>  
> +void stack_depot_evict(depot_stack_handle_t handle)
> +{
> +	struct stack_record *stack, **bucket;
> +	unsigned long flags;
> +
> +	if (!handle || stack_depot_disabled)
> +		return;
> +
> +	write_lock_irqsave(&pool_rwlock, flags);
> +
> +	stack = depot_fetch_stack(handle);
> +	if (WARN_ON(!stack))
> +		goto out;
> +
> +	if (refcount_dec_and_test(&stack->count)) {
> +		/* Drop stack from the hash table. */
> +		if (stack->next)
> +			stack->next->prev = stack->prev;
> +		if (stack->prev)
> +			stack->prev->next = stack->next;
> +		else {
> +			bucket = &stack_table[stack->hash & stack_hash_mask];
> +			*bucket = stack->next;
> +		}
> +		stack->next = NULL;
> +		stack->prev = NULL;
> +
> +		/* Free stack. */
> +		depot_free_stack(stack);
> +	}
> +
> +out:
> +	write_unlock_irqrestore(&pool_rwlock, flags);
> +}
> +EXPORT_SYMBOL_GPL(stack_depot_evict);
> +
>  void stack_depot_print(depot_stack_handle_t stack)
>  {
>  	unsigned long *entries;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/15] stackdepot: add backwards links to hash table buckets
  2023-08-29 17:11 ` [PATCH 13/15] stackdepot: add backwards links to hash table buckets andrey.konovalov
@ 2023-08-30  9:24   ` Marco Elver
  2023-09-13 17:07     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  9:24 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Maintain links in the stack records to previous entries within the
> hash table buckets.
> 
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  lib/stackdepot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index a84c0debbb9e..641db97d8c7c 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -58,6 +58,7 @@ union handle_parts {
>  
>  struct stack_record {
>  	struct stack_record *next;	/* Link in hash table or freelist */
> +	struct stack_record *prev;	/* Link in hash table */

At this point this could be a normal list_head? Then you don't have to
roll your own doubly-linked list manipulation (and benefit from things
like CONFIG_LIST_DEBUG).

>  	u32 hash;			/* Hash in hash table */
>  	u32 size;			/* Number of stored frames */
>  	union handle_parts handle;
> @@ -493,6 +494,9 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  
>  		if (new) {
>  			new->next = *bucket;
> +			new->prev = NULL;
> +			if (*bucket)
> +				(*bucket)->prev = new;
>  			*bucket = new;
>  			found = new;
>  		}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-08-29 17:11 ` [PATCH 12/15] stackdepot: add refcount for records andrey.konovalov
@ 2023-08-30  9:32   ` Marco Elver
  2023-09-04 18:46     ` Andrey Konovalov
  2023-09-01 13:06   ` Kuan-Ying Lee (李冠穎)
  1 sibling, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  9:32 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add a reference counter for how many times a stack records has been added
> to stack depot.
> 
> Do no yet decrement the refcount, this is implemented in one of the
> following patches.
> 
> This is preparatory patch for implementing the eviction of stack records
> from the stack depot.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  lib/stackdepot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ad454367379..a84c0debbb9e 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -22,6 +22,7 @@
>  #include <linux/mutex.h>
>  #include <linux/percpu.h>
>  #include <linux/printk.h>
> +#include <linux/refcount.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/stacktrace.h>
> @@ -60,6 +61,7 @@ struct stack_record {
>  	u32 hash;			/* Hash in hash table */
>  	u32 size;			/* Number of stored frames */
>  	union handle_parts handle;
> +	refcount_t count;
>  	unsigned long entries[DEPOT_STACK_MAX_FRAMES];	/* Frames */
>  };
>  
> @@ -348,6 +350,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>  	stack->hash = hash;
>  	stack->size = size;
>  	/* stack->handle is already filled in by depot_init_pool. */
> +	refcount_set(&stack->count, 1);
>  	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
>  
>  	/*
> @@ -452,6 +455,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  	/* Fast path: look the stack trace up without full locking. */
>  	found = find_stack(*bucket, entries, nr_entries, hash);
>  	if (found) {
> +		refcount_inc(&found->count);

If someone doesn't use stack_depot_evict(), and the refcount eventually
overflows, it'll do a WARN (per refcount_warn_saturate()).

I think the interface needs to be different:

	stack_depot_get(): increments refcount (could be inline if just
	wrapper around refcount_inc())

	stack_depot_put(): what stack_depot_evict() currently does

Then it's clear that if someone uses either stack_depot_get() or _put()
that these need to be balanced. Not using either will result in the old
behaviour of never evicting an entry.

>  		read_unlock_irqrestore(&pool_rwlock, flags);
>  		goto exit;
>  	}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes
  2023-08-29 17:11 ` [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes andrey.konovalov
@ 2023-08-30  9:38   ` Marco Elver
  2023-09-04 18:48     ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-08-30  9:38 UTC (permalink / raw)
  To: andrey.konovalov
  Cc: Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Aug 29, 2023 at 07:11PM +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Evict stack traces from the stack depot for the tag-based KASAN modes
> once they are evicted from the stack ring.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  mm/kasan/tags.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
> index 7dcfe341d48e..fa6b0f77a7dd 100644
> --- a/mm/kasan/tags.c
> +++ b/mm/kasan/tags.c
> @@ -96,7 +96,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
>  			gfp_t gfp_flags, bool is_free)
>  {
>  	unsigned long flags;
> -	depot_stack_handle_t stack;
> +	depot_stack_handle_t stack, old_stack;
>  	u64 pos;
>  	struct kasan_stack_ring_entry *entry;
>  	void *old_ptr;
> @@ -120,6 +120,8 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
>  	if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
>  		goto next; /* Busy slot. */
>  
> +	old_stack = READ_ONCE(entry->stack);

Why READ_ONCE? Is it possible that there is a concurrent writer once the
slot has been "locked" with STACK_RING_BUSY_PTR?

If there is no concurrency, it would be clearer to leave it unmarked and
add a comment to that effect. (I also think a comment would be good to
say what the WRITE_ONCE below pair with, because at this point I've
forgotten.)

>  	WRITE_ONCE(entry->size, cache->object_size);
>  	WRITE_ONCE(entry->pid, current->pid);
>  	WRITE_ONCE(entry->stack, stack);
> @@ -131,6 +133,9 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
>  	smp_store_release(&entry->ptr, (s64)object);
>  
>  	read_unlock_irqrestore(&stack_ring.lock, flags);
> +
> +	if (old_stack)
> +		stack_depot_evict(old_stack);
>  }
>  
>  void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-08-29 17:11 ` [PATCH 12/15] stackdepot: add refcount for records andrey.konovalov
  2023-08-30  9:32   ` Marco Elver
@ 2023-09-01 13:06   ` Kuan-Ying Lee (李冠穎)
  2023-09-04 18:46     ` Andrey Konovalov
  1 sibling, 1 reply; 50+ messages in thread
From: Kuan-Ying Lee (李冠穎) @ 2023-09-01 13:06 UTC (permalink / raw)
  To: glider, elver, andrey.konovalov
  Cc: andreyknvl, linux-kernel, linux-mm, andreyknvl, vbabka,
	kasan-dev, dvyukov, akpm,
	Kuan-Ying Lee (李冠穎),
	eugenis

On Tue, 2023-08-29 at 19:11 +0200, andrey.konovalov@linux.dev wrote:
> From: Andrey Konovalov <andreyknvl@google.com>
> 
> Add a reference counter for how many times a stack records has been
> added
> to stack depot.
> 
> Do no yet decrement the refcount, this is implemented in one of the
> following patches.
> 
> This is preparatory patch for implementing the eviction of stack
> records
> from the stack depot.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> ---
>  lib/stackdepot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ad454367379..a84c0debbb9e 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -22,6 +22,7 @@
>  #include <linux/mutex.h>
>  #include <linux/percpu.h>
>  #include <linux/printk.h>
> +#include <linux/refcount.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/stacktrace.h>
> @@ -60,6 +61,7 @@ struct stack_record {
>  	u32 hash;			/* Hash in hash table */
>  	u32 size;			/* Number of stored frames */
>  	union handle_parts handle;
> +	refcount_t count;
>  	unsigned long entries[DEPOT_STACK_MAX_FRAMES];	/* Frames */
>  };
>  
> @@ -348,6 +350,7 @@ depot_alloc_stack(unsigned long *entries, int
> size, u32 hash, void **prealloc)
>  	stack->hash = hash;
>  	stack->size = size;
>  	/* stack->handle is already filled in by depot_init_pool. */
> +	refcount_set(&stack->count, 1);
>  	memcpy(stack->entries, entries, flex_array_size(stack, entries,
> size));
>  
>  	/*
> @@ -452,6 +455,7 @@ depot_stack_handle_t __stack_depot_save(unsigned
> long *entries,
>  	/* Fast path: look the stack trace up without full locking. */
>  	found = find_stack(*bucket, entries, nr_entries, hash);
>  	if (found) {
> +		refcount_inc(&found->count);
>  		read_unlock_irqrestore(&pool_rwlock, flags);
>  		goto exit;
>  	}

Hi Andrey,

There are two find_stack() function calls in __stack_depot_save().

Maybe we need to add refcount_inc() for both two find_stack()?

Thanks,
Kuan-Ying Lee

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/15] stackdepot: allow evicting stack traces
  2023-08-30  7:46 ` [PATCH 00/15] stackdepot: allow evicting stack traces Vlastimil Babka
@ 2023-09-04 18:45   ` Andrey Konovalov
  2023-09-05  2:48     ` Kuan-Ying Lee (李冠穎)
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: andrey.konovalov, Marco Elver, Alexander Potapenko,
	Dmitry Vyukov, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 9:46 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> I wonder if there's also another thing to consider for the future:
>
> 3. With the number of stackdepot users increasing, each having their
> distinct set of stacks from others, would it make sense to create separate
> "storage instance" for each user instead of putting everything in a single
> shared one?

This shouldn't be hard to implement. However, do you see any
particular use cases for this?

One thing that comes to mind is that the users will then be able to
create/destroy stack depot instances when required. But I don't know
if any of the users need this: so far they all seem to require stack
depot throughout the whole lifetime of the system.

> In any case, evicting support is a good development, thanks!

Thank you!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/15] stackdepot: fix and clean-up atomic annotations
  2023-08-30  8:34   ` Marco Elver
@ 2023-09-04 18:45     ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:45 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 10:34 AM Marco Elver <elver@google.com> wrote:
>
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index 93191ee70fc3..9ae71e1ef1a7 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -226,10 +226,10 @@ static void depot_init_pool(void **prealloc)
> >       /*
> >        * If the next pool is already initialized or the maximum number of
> >        * pools is reached, do not use the preallocated memory.
> > -      * smp_load_acquire() here pairs with smp_store_release() below and
> > -      * in depot_alloc_stack().
> > +      * READ_ONCE is only used to mark the variable as atomic,
> > +      * there are no concurrent writes.
>
> This doesn't make sense. If there are no concurrent writes, we should
> drop the marking, so that if there are concurrent writes, tools like
> KCSAN can tell us about it if our assumption was wrong.

Makes sense, will do in v2.

> > @@ -425,8 +424,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> >        * Check if another stack pool needs to be initialized. If so, allocate
> >        * the memory now - we won't be able to do that under the lock.
> >        *
> > -      * The smp_load_acquire() here pairs with smp_store_release() to
> > -      * |next_pool_inited| in depot_alloc_stack() and depot_init_pool().
> > +      * smp_load_acquire pairs with smp_store_release
> > +      * in depot_alloc_stack and depot_init_pool.
>
> Reflow comment to match 80 cols used by comments elsewhere.

Will do in v2.

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2023-08-30  9:13   ` Marco Elver
@ 2023-09-04 18:46     ` Andrey Konovalov
  2023-09-05 16:19       ` Marco Elver
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:46 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 11:13 AM Marco Elver <elver@google.com> wrote:
>
> > -static int new_pool_required = 1;
> > +static bool new_pool_required = true;
> > +/* Lock that protects the variables above. */
> > +static DEFINE_RWLOCK(pool_rwlock);
>
> Despite this being a rwlock, it'll introduce tons of (cache) contention
> for the common case (stack depot entry exists).
>
> If creating new stack depot entries is only common during "warm-up" and
> then becomes exceedingly rare, I think a percpu-rwsem (read-lock is a
> CPU-local access, but write-locking is expensive) may be preferable.

Good suggestion. I propose that we keep the rwlock for now, and I'll
check whether the performance is better with percpu-rwsem once I get
to implementing and testing the performance changes. I'll also check
whether percpu-rwsem makes sense for stack ring in tag-based KASAN
modes.

> > @@ -262,10 +258,8 @@ static void depot_keep_new_pool(void **prealloc)
> >       /*
> >        * If a new pool is already saved or the maximum number of
> >        * pools is reached, do not use the preallocated memory.
> > -      * READ_ONCE is only used to mark the variable as atomic,
> > -      * there are no concurrent writes.
> >        */
> > -     if (!READ_ONCE(new_pool_required))
> > +     if (!new_pool_required)
>
> In my comment for the other patch I already suggested this change. Maybe
> move it there.

Will do in v2.

>
> >               return;
> >
> >       /*
> > @@ -281,9 +275,8 @@ static void depot_keep_new_pool(void **prealloc)
> >        * At this point, either a new pool is kept or the maximum
> >        * number of pools is reached. In either case, take note that
> >        * keeping another pool is not required.
> > -      * smp_store_release pairs with smp_load_acquire in stack_depot_save.
> >        */
> > -     smp_store_release(&new_pool_required, 0);
> > +     new_pool_required = false;
> >  }
> >
> >  /* Updates refences to the current and the next stack depot pools. */
> > @@ -300,7 +293,7 @@ static bool depot_update_pools(void **prealloc)
> >
> >               /* Take note that we might need a new new_pool. */
> >               if (pools_num < DEPOT_MAX_POOLS)
> > -                     smp_store_release(&new_pool_required, 1);
> > +                     new_pool_required = true;
> >
> >               /* Try keeping the preallocated memory for new_pool. */
> >               goto out_keep_prealloc;
> > @@ -369,18 +362,13 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
> >  static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
> >  {
> >       union handle_parts parts = { .handle = handle };
> > -     /*
> > -      * READ_ONCE pairs with potential concurrent write in
> > -      * depot_init_pool.
> > -      */
> > -     int pools_num_cached = READ_ONCE(pools_num);
> >       void *pool;
> >       size_t offset = parts.offset << DEPOT_STACK_ALIGN;
> >       struct stack_record *stack;
>
> I'd add lockdep assertions to check that the lock is held appropriately
> when entering various helper functions that don't actually take the
> lock. Similarly for places that should not have the lock held you could
> assert the lock is not held.

Will do in v2.

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-08-30  9:32   ` Marco Elver
@ 2023-09-04 18:46     ` Andrey Konovalov
  2023-09-04 18:55       ` Marco Elver
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:46 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 11:33 AM Marco Elver <elver@google.com> wrote:
>
> If someone doesn't use stack_depot_evict(), and the refcount eventually
> overflows, it'll do a WARN (per refcount_warn_saturate()).
>
> I think the interface needs to be different:
>
>         stack_depot_get(): increments refcount (could be inline if just
>         wrapper around refcount_inc())
>
>         stack_depot_put(): what stack_depot_evict() currently does
>
> Then it's clear that if someone uses either stack_depot_get() or _put()
> that these need to be balanced. Not using either will result in the old
> behaviour of never evicting an entry.

So you mean the exported interface needs to be different? And the
users will need to call both stack_depot_save+stack_depot_get for
saving? Hm, this seems odd.

WDYT about adding a new flavor of stack_depot_save called
stack_depot_save_get that would increment the refcount? And renaming
stack_depot_evict to stack_depot_put.

I'm not sure though if the overflow is actually an issue. Hitting that
would require calling stack_depot_save INT_MAX times.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-09-01 13:06   ` Kuan-Ying Lee (李冠穎)
@ 2023-09-04 18:46     ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:46 UTC (permalink / raw)
  To: Kuan-Ying Lee (李冠穎)
  Cc: glider, elver, andrey.konovalov, andreyknvl, linux-kernel,
	linux-mm, vbabka, kasan-dev, dvyukov, akpm, eugenis

On Fri, Sep 1, 2023 at 3:06 PM 'Kuan-Ying Lee (李冠穎)' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> > @@ -452,6 +455,7 @@ depot_stack_handle_t __stack_depot_save(unsigned
> > long *entries,
> >       /* Fast path: look the stack trace up without full locking. */
> >       found = find_stack(*bucket, entries, nr_entries, hash);
> >       if (found) {
> > +             refcount_inc(&found->count);
> >               read_unlock_irqrestore(&pool_rwlock, flags);
> >               goto exit;
> >       }
>
> Hi Andrey,
>
> There are two find_stack() function calls in __stack_depot_save().
>
> Maybe we need to add refcount_inc() for both two find_stack()?

Indeed, good catch! Will fix in v2.

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 14/15] stackdepot: allow users to evict stack traces
  2023-08-30  9:20   ` Marco Elver
@ 2023-09-04 18:47     ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:47 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 11:20 AM Marco Elver <elver@google.com> wrote:
>
> > +/**
> > + * stack_depot_evict - Drop a reference to a stack trace from stack depot
> > + *
> > + * @handle:  Stack depot handle returned from stack_depot_save()
> > + *
> > + * The stack trace gets fully removed from stack depot once all references
>
> "gets fully removed" -> "is evicted" ?
>
> > + * to it has been dropped (once the number of stack_depot_evict calls matches
>
> "has been" -> "have been"

Will fix both in v2. Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes
  2023-08-30  9:38   ` Marco Elver
@ 2023-09-04 18:48     ` Andrey Konovalov
  2023-09-04 18:58       ` Marco Elver
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-04 18:48 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 11:38 AM Marco Elver <elver@google.com> wrote:
>
> > --- a/mm/kasan/tags.c
> > +++ b/mm/kasan/tags.c
> > @@ -96,7 +96,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
> >                       gfp_t gfp_flags, bool is_free)
> >  {
> >       unsigned long flags;
> > -     depot_stack_handle_t stack;
> > +     depot_stack_handle_t stack, old_stack;
> >       u64 pos;
> >       struct kasan_stack_ring_entry *entry;
> >       void *old_ptr;
> > @@ -120,6 +120,8 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
> >       if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
> >               goto next; /* Busy slot. */
> >
> > +     old_stack = READ_ONCE(entry->stack);
>
> Why READ_ONCE? Is it possible that there is a concurrent writer once the
> slot has been "locked" with STACK_RING_BUSY_PTR?
>
> If there is no concurrency, it would be clearer to leave it unmarked and
> add a comment to that effect. (I also think a comment would be good to
> say what the WRITE_ONCE below pair with, because at this point I've
> forgotten.)

Hm, I actually suspect we don't need these READ/WRITE_ONCE to entry
fields at all. This seems to be a leftover from the initial series
when I didn't yet have the rwlock. The rwlock prevents the entries
from being read (in kasan_complete_mode_report_info) while being
written and the try_cmpxchg prevents the same entry from being
rewritten (in the unlikely case of wrapping during writing).

Marco, do you think we can drop these READ/WRITE_ONCE?

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-09-04 18:46     ` Andrey Konovalov
@ 2023-09-04 18:55       ` Marco Elver
  2023-09-13 17:07         ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-09-04 18:55 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Mon, 4 Sept 2023 at 20:46, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 11:33 AM Marco Elver <elver@google.com> wrote:
> >
> > If someone doesn't use stack_depot_evict(), and the refcount eventually
> > overflows, it'll do a WARN (per refcount_warn_saturate()).
> >
> > I think the interface needs to be different:
> >
> >         stack_depot_get(): increments refcount (could be inline if just
> >         wrapper around refcount_inc())
> >
> >         stack_depot_put(): what stack_depot_evict() currently does
> >
> > Then it's clear that if someone uses either stack_depot_get() or _put()
> > that these need to be balanced. Not using either will result in the old
> > behaviour of never evicting an entry.
>
> So you mean the exported interface needs to be different? And the
> users will need to call both stack_depot_save+stack_depot_get for
> saving? Hm, this seems odd.
>
> WDYT about adding a new flavor of stack_depot_save called
> stack_depot_save_get that would increment the refcount? And renaming
> stack_depot_evict to stack_depot_put.

If there are no other uses of stack_depot_get(), which seems likely,
just stack_depot_save_get() seems ok.

> I'm not sure though if the overflow is actually an issue. Hitting that
> would require calling stack_depot_save INT_MAX times.

With a long-running kernel it's possible.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes
  2023-09-04 18:48     ` Andrey Konovalov
@ 2023-09-04 18:58       ` Marco Elver
  2023-09-13 17:08         ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-09-04 18:58 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Mon, 4 Sept 2023 at 20:48, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 11:38 AM Marco Elver <elver@google.com> wrote:
> >
> > > --- a/mm/kasan/tags.c
> > > +++ b/mm/kasan/tags.c
> > > @@ -96,7 +96,7 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
> > >                       gfp_t gfp_flags, bool is_free)
> > >  {
> > >       unsigned long flags;
> > > -     depot_stack_handle_t stack;
> > > +     depot_stack_handle_t stack, old_stack;
> > >       u64 pos;
> > >       struct kasan_stack_ring_entry *entry;
> > >       void *old_ptr;
> > > @@ -120,6 +120,8 @@ static void save_stack_info(struct kmem_cache *cache, void *object,
> > >       if (!try_cmpxchg(&entry->ptr, &old_ptr, STACK_RING_BUSY_PTR))
> > >               goto next; /* Busy slot. */
> > >
> > > +     old_stack = READ_ONCE(entry->stack);
> >
> > Why READ_ONCE? Is it possible that there is a concurrent writer once the
> > slot has been "locked" with STACK_RING_BUSY_PTR?
> >
> > If there is no concurrency, it would be clearer to leave it unmarked and
> > add a comment to that effect. (I also think a comment would be good to
> > say what the WRITE_ONCE below pair with, because at this point I've
> > forgotten.)
>
> Hm, I actually suspect we don't need these READ/WRITE_ONCE to entry
> fields at all. This seems to be a leftover from the initial series
> when I didn't yet have the rwlock. The rwlock prevents the entries
> from being read (in kasan_complete_mode_report_info) while being
> written and the try_cmpxchg prevents the same entry from being
> rewritten (in the unlikely case of wrapping during writing).
>
> Marco, do you think we can drop these READ/WRITE_ONCE?

Yes, I think they can be dropped.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/15] stackdepot: allow evicting stack traces
  2023-09-04 18:45   ` Andrey Konovalov
@ 2023-09-05  2:48     ` Kuan-Ying Lee (李冠穎)
  2023-09-13 17:10       ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Kuan-Ying Lee (李冠穎) @ 2023-09-05  2:48 UTC (permalink / raw)
  To: andreyknvl, vbabka
  Cc: andreyknvl, linux-kernel, linux-mm, kasan-dev, dvyukov, akpm,
	Kuan-Ying Lee (李冠穎),
	elver, eugenis, andrey.konovalov, glider

On Mon, 2023-09-04 at 20:45 +0200, Andrey Konovalov wrote:
> On Wed, Aug 30, 2023 at 9:46 AM Vlastimil Babka <vbabka@suse.cz>
> wrote:
> > 
> > I wonder if there's also another thing to consider for the future:
> > 
> > 3. With the number of stackdepot users increasing, each having
> > their
> > distinct set of stacks from others, would it make sense to create
> > separate
> > "storage instance" for each user instead of putting everything in a
> > single
> > shared one?
> 
> This shouldn't be hard to implement. However, do you see any
> particular use cases for this?
> 
> One thing that comes to mind is that the users will then be able to
> create/destroy stack depot instances when required. But I don't know
> if any of the users need this: so far they all seem to require stack
> depot throughout the whole lifetime of the system.
> 
Maybe we can use evition in page_owner and slub_debug
(SLAB_STORE_USER).

After we update page_owner->handle, we could evict the previous
handle?

> > In any case, evicting support is a good development, thanks!
> 
> Thank you!
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2023-09-04 18:46     ` Andrey Konovalov
@ 2023-09-05 16:19       ` Marco Elver
  2023-09-13 17:08         ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2023-09-05 16:19 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Mon, 4 Sept 2023 at 20:46, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 11:13 AM Marco Elver <elver@google.com> wrote:
> >
> > > -static int new_pool_required = 1;
> > > +static bool new_pool_required = true;
> > > +/* Lock that protects the variables above. */
> > > +static DEFINE_RWLOCK(pool_rwlock);
> >
> > Despite this being a rwlock, it'll introduce tons of (cache) contention
> > for the common case (stack depot entry exists).
> >
> > If creating new stack depot entries is only common during "warm-up" and
> > then becomes exceedingly rare, I think a percpu-rwsem (read-lock is a
> > CPU-local access, but write-locking is expensive) may be preferable.
>
> Good suggestion. I propose that we keep the rwlock for now, and I'll
> check whether the performance is better with percpu-rwsem once I get
> to implementing and testing the performance changes. I'll also check
> whether percpu-rwsem makes sense for stack ring in tag-based KASAN
> modes.

I think it's quite obvious that the percpu-rwsem is better. A simple
experiment is to measure the ratio of stackdepot hits vs misses. If
the ratio is obviously skewed towards hits, then I'd just go with the
percpu-rwsem.

The performance benefit may not be measurable if you use a small system.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/15] stackdepot: use fixed-sized slots for stack records
  2023-08-30  8:21   ` Alexander Potapenko
@ 2023-09-13 17:07     ` Andrey Konovalov
  2023-09-15 10:36       ` Alexander Potapenko
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:07 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: andrey.konovalov, Marco Elver, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 10:22 AM Alexander Potapenko <glider@google.com> wrote:
>
> On Tue, Aug 29, 2023 at 7:11 PM <andrey.konovalov@linux.dev> wrote:
> >
> > From: Andrey Konovalov <andreyknvl@google.com>
> >
> > Instead of storing stack records in stack depot pools one right after
> > another, use 32-frame-sized slots.
>
> I am slightly concerned about the KMSAN use case here, which defines
> KMSAN_STACK_DEPTH to 64.

Hm, indeed. KASAN also defines the depth to 64 actually.

I think it's reasonable to change the default value to 64 to cover all
the existing users. And whoever wants to save up on memory can change
the Kconfig parameter (I'll add one as you suggested).

> I don't have a comprehensive stack depth breakdown, but a quick poking
> around syzkaller.appspot.com shows several cases where the stacks are
> actually longer than 32 frames.

Whichever value we choose, some of stack traces will not fit
unfortunately. But yeah, 64 seems to be a more reasonable value.

> Can you add a config parameter for the stack depth instead of
> mandating 32 frames everywhere?

Sure, will do in v2.

> As a side note, kmsan_internal_chain_origin()
> (https://elixir.bootlin.com/linux/latest/source/mm/kmsan/core.c#L214)
> creates small 3-frame records in the stack depot to link two stacks
> together, which will add unnecessary stackdepot pressure.
> But this can be fixed by storing both the new stack trace and the link
> to the old stack trace in the same record.

Do you mean this can be fixed in KMSAN? Or do you mean some kind of an
extension to the stack depot interface?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/15] stackdepot: add backwards links to hash table buckets
  2023-08-30  9:24   ` Marco Elver
@ 2023-09-13 17:07     ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:07 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, Aug 30, 2023 at 11:24 AM Marco Elver <elver@google.com> wrote:
>
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index a84c0debbb9e..641db97d8c7c 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -58,6 +58,7 @@ union handle_parts {
> >
> >  struct stack_record {
> >       struct stack_record *next;      /* Link in hash table or freelist */
> > +     struct stack_record *prev;      /* Link in hash table */
>
> At this point this could be a normal list_head? Then you don't have to
> roll your own doubly-linked list manipulation (and benefit from things
> like CONFIG_LIST_DEBUG).

Yeah, I think this makes sense. Will do in v2. Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/15] stackdepot: add refcount for records
  2023-09-04 18:55       ` Marco Elver
@ 2023-09-13 17:07         ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:07 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Mon, Sep 4, 2023 at 8:56 PM Marco Elver <elver@google.com> wrote:
>
> > WDYT about adding a new flavor of stack_depot_save called
> > stack_depot_save_get that would increment the refcount? And renaming
> > stack_depot_evict to stack_depot_put.
>
> If there are no other uses of stack_depot_get(), which seems likely,
> just stack_depot_save_get() seems ok.

Ok, I will implement a similar approach in v2: add another flag to
__stack_depot_save to avoid multiplying API functions.

Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes
  2023-09-04 18:58       ` Marco Elver
@ 2023-09-13 17:08         ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Mon, Sep 4, 2023 at 8:59 PM Marco Elver <elver@google.com> wrote:
>
> >
> > Hm, I actually suspect we don't need these READ/WRITE_ONCE to entry
> > fields at all. This seems to be a leftover from the initial series
> > when I didn't yet have the rwlock. The rwlock prevents the entries
> > from being read (in kasan_complete_mode_report_info) while being
> > written and the try_cmpxchg prevents the same entry from being
> > rewritten (in the unlikely case of wrapping during writing).
> >
> > Marco, do you think we can drop these READ/WRITE_ONCE?
>
> Yes, I think they can be dropped.

Will drop in v2, thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2023-09-05 16:19       ` Marco Elver
@ 2023-09-13 17:08         ` Andrey Konovalov
  2024-01-02 12:59           ` Marco Elver
  0 siblings, 1 reply; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Sep 5, 2023 at 6:19 PM Marco Elver <elver@google.com> wrote:
>
> > Good suggestion. I propose that we keep the rwlock for now, and I'll
> > check whether the performance is better with percpu-rwsem once I get
> > to implementing and testing the performance changes. I'll also check
> > whether percpu-rwsem makes sense for stack ring in tag-based KASAN
> > modes.
>
> I think it's quite obvious that the percpu-rwsem is better. A simple
> experiment is to measure the ratio of stackdepot hits vs misses. If
> the ratio is obviously skewed towards hits, then I'd just go with the
> percpu-rwsem.
>
> The performance benefit may not be measurable if you use a small system.

I started looking into using percpu-rwsem, but it appears that it
doesn't have the irqsave/irqrestore API flavor. I suspect that it
shouldn't be hard to add it, but I'd rather not pursue this as a part
of this series.

So I still propose to keep the rwlock for now, and switch to
percpu-rwsem later together with the other perf changes.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/15] stackdepot: allow evicting stack traces
  2023-09-05  2:48     ` Kuan-Ying Lee (李冠穎)
@ 2023-09-13 17:10       ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2023-09-13 17:10 UTC (permalink / raw)
  To: Kuan-Ying Lee (李冠穎)
  Cc: vbabka, andreyknvl, linux-kernel, linux-mm, kasan-dev, dvyukov,
	akpm, elver, eugenis, andrey.konovalov, glider

On Tue, Sep 5, 2023 at 4:48 AM 'Kuan-Ying Lee (李冠穎)' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> > > 3. With the number of stackdepot users increasing, each having
> > > their
> > > distinct set of stacks from others, would it make sense to create
> > > separate
> > > "storage instance" for each user instead of putting everything in a
> > > single
> > > shared one?
> >
> > This shouldn't be hard to implement. However, do you see any
> > particular use cases for this?
> >
> > One thing that comes to mind is that the users will then be able to
> > create/destroy stack depot instances when required. But I don't know
> > if any of the users need this: so far they all seem to require stack
> > depot throughout the whole lifetime of the system.
> >
> Maybe we can use evition in page_owner and slub_debug
> (SLAB_STORE_USER).
>
> After we update page_owner->handle, we could evict the previous
> handle?

We can definitely adapt more users to the new API. My comment was
related to the suggestion of splitting stack depot storages for
different users.

But actually I have a response to my question about the split: if each
user has a separate stack depot storage instance, they can set the
maximum stack trace size as they desire, and thus save up on memory.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/15] stackdepot: use fixed-sized slots for stack records
  2023-09-13 17:07     ` Andrey Konovalov
@ 2023-09-15 10:36       ` Alexander Potapenko
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Potapenko @ 2023-09-15 10:36 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: andrey.konovalov, Marco Elver, Dmitry Vyukov, Vlastimil Babka,
	kasan-dev, Evgenii Stepanov, Andrew Morton, linux-mm,
	linux-kernel, Andrey Konovalov

> > As a side note, kmsan_internal_chain_origin()
> > (https://elixir.bootlin.com/linux/latest/source/mm/kmsan/core.c#L214)
> > creates small 3-frame records in the stack depot to link two stacks
> > together, which will add unnecessary stackdepot pressure.
> > But this can be fixed by storing both the new stack trace and the link
> > to the old stack trace in the same record.
>
> Do you mean this can be fixed in KMSAN? Or do you mean some kind of an
> extension to the stack depot interface?

Yes, I'll just fix this on the KMSAN side.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2023-09-13 17:08         ` Andrey Konovalov
@ 2024-01-02 12:59           ` Marco Elver
  2024-01-09  3:27             ` Andrey Konovalov
  0 siblings, 1 reply; 50+ messages in thread
From: Marco Elver @ 2024-01-02 12:59 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Wed, 13 Sept 2023 at 19:09, Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Tue, Sep 5, 2023 at 6:19 PM Marco Elver <elver@google.com> wrote:
> >
> > > Good suggestion. I propose that we keep the rwlock for now, and I'll
> > > check whether the performance is better with percpu-rwsem once I get
> > > to implementing and testing the performance changes. I'll also check
> > > whether percpu-rwsem makes sense for stack ring in tag-based KASAN
> > > modes.
> >
> > I think it's quite obvious that the percpu-rwsem is better. A simple
> > experiment is to measure the ratio of stackdepot hits vs misses. If
> > the ratio is obviously skewed towards hits, then I'd just go with the
> > percpu-rwsem.
> >
> > The performance benefit may not be measurable if you use a small system.
>
> I started looking into using percpu-rwsem, but it appears that it
> doesn't have the irqsave/irqrestore API flavor. I suspect that it
> shouldn't be hard to add it, but I'd rather not pursue this as a part
> of this series.
>
> So I still propose to keep the rwlock for now, and switch to
> percpu-rwsem later together with the other perf changes.

I may have gotten lost in the post-vacation email avalanche and missed
it: did you already send the percpu-rwsem optimization? I am a little
worried about the contention the plain rwlock introduces on big
machines.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/15] stackdepot: use read/write lock
  2024-01-02 12:59           ` Marco Elver
@ 2024-01-09  3:27             ` Andrey Konovalov
  0 siblings, 0 replies; 50+ messages in thread
From: Andrey Konovalov @ 2024-01-09  3:27 UTC (permalink / raw)
  To: Marco Elver
  Cc: andrey.konovalov, Alexander Potapenko, Dmitry Vyukov,
	Vlastimil Babka, kasan-dev, Evgenii Stepanov, Andrew Morton,
	linux-mm, linux-kernel, Andrey Konovalov

On Tue, Jan 2, 2024 at 1:59 PM Marco Elver <elver@google.com> wrote:
>
> > I started looking into using percpu-rwsem, but it appears that it
> > doesn't have the irqsave/irqrestore API flavor. I suspect that it
> > shouldn't be hard to add it, but I'd rather not pursue this as a part
> > of this series.
> >
> > So I still propose to keep the rwlock for now, and switch to
> > percpu-rwsem later together with the other perf changes.
>
> I may have gotten lost in the post-vacation email avalanche and missed
> it: did you already send the percpu-rwsem optimization? I am a little
> worried about the contention the plain rwlock introduces on big
> machines.

I didn't get to working on that part unfortunately :(

I filed https://bugzilla.kernel.org/show_bug.cgi?id=218312 for this.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-01-09  3:28 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 17:11 [PATCH 00/15] stackdepot: allow evicting stack traces andrey.konovalov
2023-08-29 17:11 ` [PATCH 01/15] stackdepot: check disabled flag when fetching andrey.konovalov
2023-08-30  7:40   ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 02/15] stackdepot: simplify __stack_depot_save andrey.konovalov
2023-08-30  7:41   ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 03/15] stackdepot: drop valid bit from handles andrey.konovalov
2023-08-30  7:43   ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 04/15] stackdepot: add depot_fetch_stack helper andrey.konovalov
2023-08-30  7:47   ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 05/15] stackdepot: use fixed-sized slots for stack records andrey.konovalov
2023-08-30  8:21   ` Alexander Potapenko
2023-09-13 17:07     ` Andrey Konovalov
2023-09-15 10:36       ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 06/15] stackdepot: fix and clean-up atomic annotations andrey.konovalov
2023-08-30  8:34   ` Marco Elver
2023-09-04 18:45     ` Andrey Konovalov
2023-08-29 17:11 ` [PATCH 07/15] stackdepot: rework helpers for depot_alloc_stack andrey.konovalov
2023-08-29 17:11 ` [PATCH 08/15] stackdepot: rename next_pool_required to new_pool_required andrey.konovalov
2023-08-30  8:29   ` Alexander Potapenko
2023-08-29 17:11 ` [PATCH 09/15] stackdepot: store next pool pointer in new_pool andrey.konovalov
2023-08-29 17:11 ` [PATCH 10/15] stackdepot: store free stack records in a freelist andrey.konovalov
2023-08-29 17:11 ` [PATCH 11/15] stackdepot: use read/write lock andrey.konovalov
2023-08-30  9:13   ` Marco Elver
2023-09-04 18:46     ` Andrey Konovalov
2023-09-05 16:19       ` Marco Elver
2023-09-13 17:08         ` Andrey Konovalov
2024-01-02 12:59           ` Marco Elver
2024-01-09  3:27             ` Andrey Konovalov
2023-08-29 17:11 ` [PATCH 12/15] stackdepot: add refcount for records andrey.konovalov
2023-08-30  9:32   ` Marco Elver
2023-09-04 18:46     ` Andrey Konovalov
2023-09-04 18:55       ` Marco Elver
2023-09-13 17:07         ` Andrey Konovalov
2023-09-01 13:06   ` Kuan-Ying Lee (李冠穎)
2023-09-04 18:46     ` Andrey Konovalov
2023-08-29 17:11 ` [PATCH 13/15] stackdepot: add backwards links to hash table buckets andrey.konovalov
2023-08-30  9:24   ` Marco Elver
2023-09-13 17:07     ` Andrey Konovalov
2023-08-29 17:11 ` [PATCH 14/15] stackdepot: allow users to evict stack traces andrey.konovalov
2023-08-30  9:20   ` Marco Elver
2023-09-04 18:47     ` Andrey Konovalov
2023-08-29 17:11 ` [PATCH 15/15] kasan: use stack_depot_evict for tag-based modes andrey.konovalov
2023-08-30  9:38   ` Marco Elver
2023-09-04 18:48     ` Andrey Konovalov
2023-09-04 18:58       ` Marco Elver
2023-09-13 17:08         ` Andrey Konovalov
2023-08-30  7:46 ` [PATCH 00/15] stackdepot: allow evicting stack traces Vlastimil Babka
2023-09-04 18:45   ` Andrey Konovalov
2023-09-05  2:48     ` Kuan-Ying Lee (李冠穎)
2023-09-13 17:10       ` Andrey Konovalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.