linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] page_owner: print stacks and their counter
@ 2022-09-05  3:10 Oscar Salvador
  2022-09-05  3:10 ` [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Oscar Salvador @ 2022-09-05  3:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

Changes v1 -> v2:
	- Add feedback from Michal, Marco and Ammar (Thanks!)

Hi,

page_owner is a great debug functionality tool that gets us to know
about all pages that have been allocated/freed and their stacktrace.
This comes very handy when e.g: debugging leaks, as with some scripting
we might be able to see those stacktraces that are allocating pages
but not freeing theme.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages aand try to reconstruct the
stack <-> allocated/freed relationship. There is a lot of noise
to cancel off.

This patch aims to fix that by adding a new functionality into page_owner.
What this does is to create a new read-only file "page_owner_stacks",
which prints only the allocating stacktraces and their counting, being that
the times the stacktrace has allocated - the times it has freed.

So we have a clear overview of stacks <-> allocated/freed relationship
without the need to fiddle with pages and trying to match free stacktraces
with allocated stacktraces.

This is achieved by adding a new refcount_t field in the stack_record struct,
incrementing that refcount_t everytime the same stacktrace allocates,
and decrementing it when it frees a page. Details can be seen in the
respective patches.

We also create another file called "page_owner_threshold", which let us
specify a threshold, so when when reading from "page_owner_stacks",
we will only see those stacktraces which counting goes beyond the
threshold we specified.

A PoC can be found below:

# cat /sys/kernel/debug/page_owner_threshold
 0
# cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
# head -32 stacks_full.txt
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 alloc_page_interleave+0x13/0x90
 new_slab+0x31d/0x530
 ___slab_alloc+0x5d7/0x720
 __slab_alloc.isra.85+0x4a/0x90
 kmem_cache_alloc+0x455/0x4a0
 acpi_ps_alloc_op+0x57/0x8f
 acpi_ps_create_scope_op+0x12/0x23
 acpi_ps_execute_method+0x102/0x2c1
 acpi_ns_evaluate+0x343/0x4da
 acpi_evaluate_object+0x1cb/0x392
 acpi_run_osc+0x135/0x260
 acpi_init+0x165/0x4ed
 do_one_initcall+0x3e/0x200
stack count: 2

 free_pcp_prepare+0x287/0x5c0
 free_unref_page+0x1c/0xd0
 __mmdrop+0x50/0x160
 finish_task_switch+0x249/0x2b0
 __schedule+0x2c3/0x960
 schedule+0x44/0xb0
 futex_wait_queue+0x70/0xd0
 futex_wait+0x160/0x250
 do_futex+0x11c/0x1b0
 __x64_sys_futex+0x5e/0x1d0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 1

 

# echo 10000 > /sys/kernel/debug/page_owner_threshold
# cat /sys/kernel/debug/page_owner_stacks > stacks_10000.txt
# cat stacks_10000.txt 
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 __kernel_read+0x136/0x2f0
 kernel_read_file+0x197/0x2d0
 kernel_read_file_from_fd+0x54/0x90
 __do_sys_finit_module+0x89/0x120
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 36195

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 44484

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0xdd/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 17874

Oscar Salvador (3):
  lib/stackdepot: Add a refcount field in stack_record
  mm, page_owner: Add page_owner_stacks file to print out only stacks
    and their counter
  mm,page_owner: Filter out stacks by a threshold counter

 include/linux/stackdepot.h |  15 ++++-
 lib/stackdepot.c           | 121 ++++++++++++++++++++++++++++++++-----
 mm/kasan/common.c          |   3 +-
 mm/page_owner.c            |  88 +++++++++++++++++++++++++--
 4 files changed, 207 insertions(+), 20 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-09-05  3:10 [PATCH v2 0/3] page_owner: print stacks and their counter Oscar Salvador
@ 2022-09-05  3:10 ` Oscar Salvador
  2022-09-05 20:57   ` Andrey Konovalov
  2022-09-05  3:10 ` [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter Oscar Salvador
  2022-09-05  3:10 ` [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
  2 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2022-09-05  3:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We want to filter out page_owner output and print only those
stacks that have been repeated beyond a certain threshold.
This gives us the chance to get rid of a lot of noise.
In order to do that, we need to keep track of how many repeated stacks
(for allocation) do we have, so we add a new refcount_t field
in the stack_record struct.

Note that this might increase the size of the struct for some
architectures.
E.g: x86_64 is not affected due to alignment, but x86 32bits might.

The alternative would be to have some kind of struct like this:

struct track_stacks {
	struct stack_record *stack;
	struct track_stacks *next;
	refcount_t stack_count;

But ithat would imply to perform more allocations and glue everything
together, which would make the code more complex, so I think that
going with a new field in the struct stack_record is good enough.

Note that on __set_page_owner_handle(), page_owner->handle is set,
and on __reset_page_owner(), page_owner->free_handle is set.

We are interested in page_owner->handle, so when __set_page_owner()
gets called, we derive the stack_record struct from page_owner->handle,
and we increment its refcount_t field; and when __reset_page_owner()
gets called, we derive its stack_record from page_owner->handle()
and we decrement its refcount_t field.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h | 13 ++++++-
 lib/stackdepot.c           | 79 +++++++++++++++++++++++++++++++-------
 mm/kasan/common.c          |  3 +-
 mm/page_owner.c            | 14 +++++--
 4 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index bc2797955de9..4e3a88f135ee 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,9 +15,16 @@
 
 typedef u32 depot_stack_handle_t;
 
+enum stack_depot_action {
+	STACK_DEPOT_ACTION_NONE,
+	STACK_DEPOT_ACTION_COUNT,
+};
+
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
-					gfp_t gfp_flags, bool can_alloc);
+					gfp_t gfp_flags, bool can_alloc,
+					enum stack_depot_action action);
+void stack_depot_dec_count(depot_stack_handle_t handle);
 
 /*
  * Every user of stack depot has to call stack_depot_init() during its own init
@@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void)	{ return 0; }
 
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries, gfp_t gfp_flags);
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+					     unsigned int nr_entries,
+					     gfp_t gfp_flags,
+					     enum stack_depot_action action);
 
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries);
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e73fda23388d..a806ef58a385 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -64,6 +64,7 @@ struct stack_record {
 	u32 hash;			/* Hash in the hastable */
 	u32 size;			/* Number of frames in the stack */
 	union handle_parts handle;
+	refcount_t count;		/* Number of the same repeated stacks */
 	unsigned long entries[];	/* Variable-sized array of entries. */
 };
 
@@ -140,6 +141,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->handle.slabindex = depot_index;
 	stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
 	stack->handle.valid = 1;
+	refcount_set(&stack->count, 1);
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	depot_offset += required_size;
 
@@ -341,6 +343,29 @@ void stack_depot_print(depot_stack_handle_t stack)
 }
 EXPORT_SYMBOL_GPL(stack_depot_print);
 
+static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
+{
+	union handle_parts parts = { .handle = handle };
+	void *slab;
+	size_t offset = parts.offset << STACK_ALLOC_ALIGN;
+	struct stack_record *stack;
+
+	if(!handle)
+		return NULL;
+
+	if (parts.slabindex > depot_index) {
+		WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
+		     parts.slabindex, depot_index, handle);
+		return NULL;
+	}
+	slab = stack_slabs[parts.slabindex];
+	if (!slab)
+		return NULL;
+
+	stack = slab + offset;
+	return stack;
+}
+
 /**
  * stack_depot_fetch - Fetch stack entries from a depot
  *
@@ -353,30 +378,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries)
 {
-	union handle_parts parts = { .handle = handle };
-	void *slab;
-	size_t offset = parts.offset << STACK_ALLOC_ALIGN;
 	struct stack_record *stack;
 
 	*entries = NULL;
 	if (!handle)
 		return 0;
 
-	if (parts.slabindex > depot_index) {
-		WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
-			parts.slabindex, depot_index, handle);
-		return 0;
-	}
-	slab = stack_slabs[parts.slabindex];
-	if (!slab)
+	stack = stack_depot_getstack(handle);
+	if (!stack)
 		return 0;
-	stack = slab + offset;
 
 	*entries = stack->entries;
 	return stack->size;
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
+static void stack_depot_inc_count(struct stack_record *stack)
+{
+	refcount_inc(&stack->count);
+}
+
+void stack_depot_dec_count(depot_stack_handle_t handle)
+{
+	struct stack_record *stack = NULL;
+
+	stack = stack_depot_getstack(handle);
+	if (stack) {
+		/*
+		 * page_owner creates some stacks via create_dummy_stack().
+		 * We are not interested in those, so make sure we only decrement
+		 * "valid" stacks.
+		 */
+		if (refcount_read(&stack->count) > 1)
+			refcount_dec(&stack->count);
+	}
+}
+
 /**
  * __stack_depot_save - Save a stack trace from an array
  *
@@ -402,7 +439,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
  */
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
-					gfp_t alloc_flags, bool can_alloc)
+					gfp_t alloc_flags, bool can_alloc,
+					enum stack_depot_action action)
 {
 	struct stack_record *found = NULL, **bucket;
 	depot_stack_handle_t retval = 0;
@@ -488,8 +526,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		/* Nobody used this memory, ok to free it. */
 		free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
 	}
-	if (found)
+	if (found) {
 		retval = found->handle.handle;
+		if (action == STACK_DEPOT_ACTION_COUNT)
+			stack_depot_inc_count(found);
+	}
 fast_exit:
 	return retval;
 }
@@ -511,6 +552,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries,
 				      gfp_t alloc_flags)
 {
-	return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+	return __stack_depot_save(entries, nr_entries, alloc_flags, true,
+				  STACK_DEPOT_ACTION_NONE);
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
+
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+					     unsigned int nr_entries,
+					     gfp_t alloc_flags,
+					     enum stack_depot_action action)
+{
+	return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
+}
+EXPORT_SYMBOL_GPL(stack_depot_save_action);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 69f583855c8b..8077c6e70815 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
 	unsigned int nr_entries;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
-	return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+	return __stack_depot_save(entries, nr_entries, flags, can_alloc,
+				  STACK_DEPOT_ACTION_NONE);
 }
 
 void kasan_set_track(struct kasan_track *track, gfp_t flags)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index e4c6f3f1695b..8730f377fa91 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -106,7 +106,8 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
 	return (void *)page_ext + page_owner_ops.offset;
 }
 
-static noinline depot_stack_handle_t save_stack(gfp_t flags)
+static noinline depot_stack_handle_t save_stack(gfp_t flags,
+						enum stack_depot_action action)
 {
 	unsigned long entries[PAGE_OWNER_STACK_DEPTH];
 	depot_stack_handle_t handle;
@@ -125,7 +126,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	current->in_page_owner = 1;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
-	handle = stack_depot_save(entries, nr_entries, flags);
+	handle = stack_depot_save_action(entries, nr_entries, flags, action);
 	if (!handle)
 		handle = failure_handle;
 
@@ -138,6 +139,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
 	int i;
 	struct page_ext *page_ext;
 	depot_stack_handle_t handle;
+	depot_stack_handle_t alloc_handle;
 	struct page_owner *page_owner;
 	u64 free_ts_nsec = local_clock();
 
@@ -145,7 +147,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
 	if (unlikely(!page_ext))
 		return;
 
-	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+	page_owner = get_page_owner(page_ext);
+	alloc_handle = page_owner->handle;
+
+	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
 	for (i = 0; i < (1 << order); i++) {
 		__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
 		page_owner = get_page_owner(page_ext);
@@ -153,6 +158,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
 		page_owner->free_ts_nsec = free_ts_nsec;
 		page_ext = page_ext_next(page_ext);
 	}
+	stack_depot_dec_count(alloc_handle);
 }
 
 static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -189,7 +195,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	if (unlikely(!page_ext))
 		return;
 
-	handle = save_stack(gfp_mask);
+	handle = save_stack(gfp_mask, STACK_DEPOT_ACTION_COUNT);
 	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-05  3:10 [PATCH v2 0/3] page_owner: print stacks and their counter Oscar Salvador
  2022-09-05  3:10 ` [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
@ 2022-09-05  3:10 ` Oscar Salvador
  2022-09-05 12:57   ` Marco Elver
  2022-09-05 22:20   ` kernel test robot
  2022-09-05  3:10 ` [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
  2 siblings, 2 replies; 23+ messages in thread
From: Oscar Salvador @ 2022-09-05  3:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We might be only interested in knowing about stacks <-> count
relationship, so instead of having to fiddle with page_owner
output and screen through pfns, let us add a new file called
'page_owner_stacks' that does just that.
By cating such file, we will get all the stacktraces followed by
its counter, so we can have a more global view.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h |  1 +
 lib/stackdepot.c           | 40 ++++++++++++++++++++++++++++++++++++++
 mm/page_owner.c            | 25 ++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 4e3a88f135ee..19d3f8295df8 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -25,6 +25,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					gfp_t gfp_flags, bool can_alloc,
 					enum stack_depot_action action);
 void stack_depot_dec_count(depot_stack_handle_t handle);
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos);
 
 /*
  * Every user of stack depot has to call stack_depot_init() during its own init
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index a806ef58a385..a198b2dbe3fb 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -565,3 +565,43 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 	return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
 }
 EXPORT_SYMBOL_GPL(stack_depot_save_action);
+
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
+{
+	int i = *pos, ret = 0;
+	struct stack_record **stacks, *stack;
+	static struct stack_record *last = NULL;
+	unsigned long stack_table_entries = stack_hash_mask + 1;
+
+	/* Continue from the last stack if we have one */
+	if (last) {
+		stack = last->next;
+	} else {
+new_table:
+		stacks = &stack_table[i];
+		stack = (struct stack_record *)stacks;
+	}
+
+	for (; stack; stack = stack->next) {
+		if (!stack->size || stack->size < 0 ||
+		    stack->size > size || stack->handle.valid != 1 ||
+		    refcount_read(&stack->count) < 1)
+			continue;
+
+		ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
+		ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
+				 refcount_read(&stack->count));
+		last = stack;
+		return ret;
+	}
+
+	i++;
+	*pos = i;
+	last = NULL;
+
+	/* Keep looking all tables for valid stacks */
+	if (i < stack_table_entries)
+		goto new_table;
+
+	return 0;
+}
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8730f377fa91..d88e6b4aefa0 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -664,6 +664,29 @@ static void init_early_allocated_pages(void)
 		init_zones_in_node(pgdat);
 }
 
+static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
+				      size_t count, loff_t *pos)
+{
+	char *kbuf;
+	int ret = 0;
+
+	count = min_t(size_t, count, PAGE_SIZE);
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	ret += stack_depot_print_stacks_threshold(kbuf, count, pos);
+	if (copy_to_user(buf, kbuf, ret))
+		ret = -EFAULT;
+
+	kfree(kbuf);
+	return ret;
+}
+
+static const struct file_operations proc_page_owner_stacks = {
+	.read = read_page_owner_stacks,
+};
+
 static const struct file_operations proc_page_owner_operations = {
 	.read		= read_page_owner,
 };
@@ -677,6 +700,8 @@ static int __init pageowner_init(void)
 
 	debugfs_create_file("page_owner", 0400, NULL, NULL,
 			    &proc_page_owner_operations);
+	debugfs_create_file("page_owner_stacks", 0400, NULL, NULL,
+			    &proc_page_owner_stacks);
 
 	return 0;
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05  3:10 [PATCH v2 0/3] page_owner: print stacks and their counter Oscar Salvador
  2022-09-05  3:10 ` [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
  2022-09-05  3:10 ` [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter Oscar Salvador
@ 2022-09-05  3:10 ` Oscar Salvador
  2022-09-05 10:51   ` Ammar Faizi
  2022-09-19 15:23   ` Vlastimil Babka
  2 siblings, 2 replies; 23+ messages in thread
From: Oscar Salvador @ 2022-09-05  3:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We want to be able to filter out the output on a threshold basis,
in this way we can get rid of a lot of noise and focus only on those
stacks which have an allegedly high counter.

We can control the threshold value by a new file called
'page_owner_threshold', which is 0 by default.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h |  3 ++-
 lib/stackdepot.c           |  6 +++--
 mm/page_owner.c            | 51 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 19d3f8295df8..742038216cd0 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -25,7 +25,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					gfp_t gfp_flags, bool can_alloc,
 					enum stack_depot_action action);
 void stack_depot_dec_count(depot_stack_handle_t handle);
-int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos);
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
+				       unsigned long threshold);
 
 /*
  * Every user of stack depot has to call stack_depot_init() during its own init
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index a198b2dbe3fb..a31e882853ab 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -566,7 +566,8 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 }
 EXPORT_SYMBOL_GPL(stack_depot_save_action);
 
-int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
+int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
+				       unsigned long threshold)
 {
 	int i = *pos, ret = 0;
 	struct stack_record **stacks, *stack;
@@ -585,7 +586,8 @@ int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
 	for (; stack; stack = stack->next) {
 		if (!stack->size || stack->size < 0 ||
 		    stack->size > size || stack->handle.valid != 1 ||
-		    refcount_read(&stack->count) < 1)
+		    refcount_read(&stack->count) < 1 ||
+		    refcount_read(&stack->count) < threshold)
 			continue;
 
 		ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
diff --git a/mm/page_owner.c b/mm/page_owner.c
index d88e6b4aefa0..5b895d347c5f 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -43,6 +43,8 @@ static depot_stack_handle_t early_handle;
 
 static void init_early_allocated_pages(void);
 
+static unsigned long threshold;
+
 static int __init early_page_owner_param(char *buf)
 {
 	int ret = kstrtobool(buf, &page_owner_enabled);
@@ -675,7 +677,7 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
 	if (!kbuf)
 		return -ENOMEM;
 
-	ret += stack_depot_print_stacks_threshold(kbuf, count, pos);
+	ret += stack_depot_print_stacks_threshold(kbuf, count, pos, threshold);
 	if (copy_to_user(buf, kbuf, ret))
 		ret = -EFAULT;
 
@@ -683,6 +685,51 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
 	return ret;
 }
 
+static int page_owner_threshold_show(struct seq_file *p, void *v)
+{
+	 seq_printf(p, "%lu\n", threshold);
+	return 0;
+}
+
+static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
+					  size_t count, loff_t *pos)
+{
+	char *kbuf;
+	int ret = 0;
+
+	count = min_t(size_t, count, PAGE_SIZE);
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	if (copy_from_user(kbuf, buf, count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	kbuf[count - 1] = '\0';
+
+	ret = kstrtoul(kbuf, 10, &threshold);
+
+out:
+	kfree(kbuf);
+	return ret ? ret : count;
+}
+
+static int open_page_owner_threshold(struct inode *inode, struct file *file)
+{
+	return single_open(file, page_owner_threshold_show, NULL);
+}
+
+
+static const struct file_operations proc_page_owner_threshold = {
+	.open = open_page_owner_threshold,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.write = write_page_owner_threshold,
+	.release = single_release,
+};
+
 static const struct file_operations proc_page_owner_stacks = {
 	.read = read_page_owner_stacks,
 };
@@ -702,6 +749,8 @@ static int __init pageowner_init(void)
 			    &proc_page_owner_operations);
 	debugfs_create_file("page_owner_stacks", 0400, NULL, NULL,
 			    &proc_page_owner_stacks);
+	debugfs_create_file("page_owner_threshold", 0600, NULL, NULL,
+			     &proc_page_owner_threshold);
 
 	return 0;
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05  3:10 ` [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
@ 2022-09-05 10:51   ` Ammar Faizi
  2022-09-05 11:31     ` Michal Hocko
  2022-09-19 15:23   ` Vlastimil Babka
  1 sibling, 1 reply; 23+ messages in thread
From: Ammar Faizi @ 2022-09-05 10:51 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Linux Kernel Mailing List,
	Linux Memory Management Mailing List, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Andrey Konovalov, Alexander Potapenko

On Mon, 5 Sep 2022 05:10:12 +0200, Oscar Salvador wrote:
> +static int page_owner_threshold_show(struct seq_file *p, void *v)
> +{
> +	 seq_printf(p, "%lu\n", threshold);

Remove a slipped leading 0x20 space here (before seq_printf()).

> +	return 0;
> +}
> +
> +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> +					  size_t count, loff_t *pos)
> +{
> +	char *kbuf;
> +	int ret = 0;
> +
> +	count = min_t(size_t, count, PAGE_SIZE);
> +	kbuf = kmalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(kbuf, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	kbuf[count - 1] = '\0';
> +
> +	ret = kstrtoul(kbuf, 10, &threshold);
> +
> +out:
> +	kfree(kbuf);
> +	return ret ? ret : count;
> +}

Still the same comment on this, kmalloc() is not really needed here.
Capping the size to PAGE_SIZE (usually 4K) is too big. `unsinged long`
is 64-bit at most, this means the max val is 18446744073709551615
(20 chars). The lifetime of @kbuf is very short as well, using a stack
allocated array of chars is fine?

Untested:

static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
					  size_t count, loff_t *pos)
{
	char kbuf[21];
	int ret;

	count = min_t(size_t, count, sizeof(kbuf));
	if (copy_from_user(kbuf, buf, count))
		return -EFAULT;

	kbuf[count - 1] = '\0';
	ret = kstrtoul(kbuf, 10, &threshold);
	return ret ? ret : count;
}

-- 
Ammar Faizi



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05 10:51   ` Ammar Faizi
@ 2022-09-05 11:31     ` Michal Hocko
  2022-09-05 11:54       ` Ammar Faizi
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2022-09-05 11:31 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Oscar Salvador, Andrew Morton, Linux Kernel Mailing List,
	Linux Memory Management Mailing List, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On Mon 05-09-22 17:51:37, Ammar Faizi wrote:
> On Mon, 5 Sep 2022 05:10:12 +0200, Oscar Salvador wrote:
> > +static int page_owner_threshold_show(struct seq_file *p, void *v)
> > +{
> > +	 seq_printf(p, "%lu\n", threshold);
> 
> Remove a slipped leading 0x20 space here (before seq_printf()).
> 
> > +	return 0;
> > +}
> > +
> > +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > +					  size_t count, loff_t *pos)
> > +{
> > +	char *kbuf;
> > +	int ret = 0;
> > +
> > +	count = min_t(size_t, count, PAGE_SIZE);
> > +	kbuf = kmalloc(count, GFP_KERNEL);
> > +	if (!kbuf)
> > +		return -ENOMEM;
> > +
> > +	if (copy_from_user(kbuf, buf, count)) {
> > +		ret = -EFAULT;
> > +		goto out;
> > +	}
> > +
> > +	kbuf[count - 1] = '\0';
> > +
> > +	ret = kstrtoul(kbuf, 10, &threshold);
> > +
> > +out:
> > +	kfree(kbuf);
> > +	return ret ? ret : count;
> > +}
> 
> Still the same comment on this, kmalloc() is not really needed here.
> Capping the size to PAGE_SIZE (usually 4K) is too big. `unsinged long`
> is 64-bit at most, this means the max val is 18446744073709551615
> (20 chars). The lifetime of @kbuf is very short as well, using a stack
> allocated array of chars is fine?
> 
> Untested:
> 
> static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> 					  size_t count, loff_t *pos)
> {
> 	char kbuf[21];
> 	int ret;
> 
> 	count = min_t(size_t, count, sizeof(kbuf));
> 	if (copy_from_user(kbuf, buf, count))
> 		return -EFAULT;
> 
> 	kbuf[count - 1] = '\0';
> 	ret = kstrtoul(kbuf, 10, &threshold);
> 	return ret ? ret : count;
> }

Isn't there a proc_dointvec counterpart for debugfs?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05 11:31     ` Michal Hocko
@ 2022-09-05 11:54       ` Ammar Faizi
  2022-09-05 12:02         ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Ammar Faizi @ 2022-09-05 11:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ammar Faizi, Oscar Salvador, Andrew Morton,
	Linux Kernel Mailing List, Linux Memory Management Mailing List,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Andrey Konovalov, Alexander Potapenko

On Mon, 5 Sep 2022 13:31:02 +0200, Michal Hocko wrote:
> On Mon 05-09-22 17:51:37, Ammar Faizi wrote:
> > On Mon, 5 Sep 2022 05:10:12 +0200, Oscar Salvador wrote:
> > > +static int page_owner_threshold_show(struct seq_file *p, void *v)
> > > +{
> > > +	 seq_printf(p, "%lu\n", threshold);
> > 
> > Remove a slipped leading 0x20 space here (before seq_printf()).
> > 
> > > +	return 0;
> > > +}
> > > +
> > > +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > > +					  size_t count, loff_t *pos)
> > > +{
> > > +	char *kbuf;
> > > +	int ret = 0;
> > > +
> > > +	count = min_t(size_t, count, PAGE_SIZE);
> > > +	kbuf = kmalloc(count, GFP_KERNEL);
> > > +	if (!kbuf)
> > > +		return -ENOMEM;
> > > +
> > > +	if (copy_from_user(kbuf, buf, count)) {
> > > +		ret = -EFAULT;
> > > +		goto out;
> > > +	}
> > > +
> > > +	kbuf[count - 1] = '\0';
> > > +
> > > +	ret = kstrtoul(kbuf, 10, &threshold);
> > > +
> > > +out:
> > > +	kfree(kbuf);
> > > +	return ret ? ret : count;
> > > +}
> > 
> > Still the same comment on this, kmalloc() is not really needed here.
> > Capping the size to PAGE_SIZE (usually 4K) is too big. `unsinged long`
> > is 64-bit at most, this means the max val is 18446744073709551615
> > (20 chars). The lifetime of @kbuf is very short as well, using a stack
> > allocated array of chars is fine?
> > 
> > Untested:
> > 
> > static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > 					  size_t count, loff_t *pos)
> > {
> > 	char kbuf[21];
> > 	int ret;
> > 
> > 	count = min_t(size_t, count, sizeof(kbuf));
> > 	if (copy_from_user(kbuf, buf, count))
> > 		return -EFAULT;
> > 
> > 	kbuf[count - 1] = '\0';
> > 	ret = kstrtoul(kbuf, 10, &threshold);
> > 	return ret ? ret : count;
> > }
> 
> Isn't there a proc_dointvec counterpart for debugfs?

Ah, well. If that's much simpler, we should go with that. I am not
familiar proc_dointvec() interface, so I couldn't say about it.

Thanks for the comment. TIL.

-- 
Ammar Faizi



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05 11:54       ` Ammar Faizi
@ 2022-09-05 12:02         ` Michal Hocko
  2022-09-05 12:42           ` Ammar Faizi
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2022-09-05 12:02 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Oscar Salvador, Andrew Morton, Linux Kernel Mailing List,
	Linux Memory Management Mailing List, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On Mon 05-09-22 18:54:59, Ammar Faizi wrote:
> On Mon, 5 Sep 2022 13:31:02 +0200, Michal Hocko wrote:
[...]
> > > static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > > 					  size_t count, loff_t *pos)
> > > {
> > > 	char kbuf[21];
> > > 	int ret;
> > > 
> > > 	count = min_t(size_t, count, sizeof(kbuf));
> > > 	if (copy_from_user(kbuf, buf, count))
> > > 		return -EFAULT;
> > > 
> > > 	kbuf[count - 1] = '\0';
> > > 	ret = kstrtoul(kbuf, 10, &threshold);
> > > 	return ret ? ret : count;
> > > }
> > 
> > Isn't there a proc_dointvec counterpart for debugfs?
> 
> Ah, well. If that's much simpler, we should go with that. I am not
> familiar proc_dointvec() interface, so I couldn't say about it.

Just to clarify. proc_dointvec is rather specific to proc/sysctl
interface. I was too lazy to look whether debugfs has something similar
available. Maybe writing to debugfs is not all that common but I would
expect a shared code to write a simple value would be there.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05 12:02         ` Michal Hocko
@ 2022-09-05 12:42           ` Ammar Faizi
  0 siblings, 0 replies; 23+ messages in thread
From: Ammar Faizi @ 2022-09-05 12:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ammar Faizi, Oscar Salvador, Andrew Morton,
	Linux Kernel Mailing List, Linux Memory Management Mailing List,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Andrey Konovalov, Alexander Potapenko

On Mon, 5 Sep 2022 14:02:09 +0200, Michal Hocko wrote:
> On Mon 05-09-22 18:54:59, Ammar Faizi wrote:
> > On Mon, 5 Sep 2022 13:31:02 +0200, Michal Hocko wrote:
> [...]
> > > > static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> > > > 					  size_t count, loff_t *pos)
> > > > {
> > > > 	char kbuf[21];
> > > > 	int ret;
> > > > 
> > > > 	count = min_t(size_t, count, sizeof(kbuf));
> > > > 	if (copy_from_user(kbuf, buf, count))
> > > > 		return -EFAULT;
> > > > 
> > > > 	kbuf[count - 1] = '\0';
> > > > 	ret = kstrtoul(kbuf, 10, &threshold);
> > > > 	return ret ? ret : count;
> > > > }
> > > 
> > > Isn't there a proc_dointvec counterpart for debugfs?
> > 
> > Ah, well. If that's much simpler, we should go with that. I am not
> > familiar proc_dointvec() interface, so I couldn't say about it.
> 
> Just to clarify. proc_dointvec is rather specific to proc/sysctl
> interface. I was too lazy to look whether debugfs has something similar
> available. Maybe writing to debugfs is not all that common but I would
> expect a shared code to write a simple value would be there.

I took a look, there is DEFINE_SIMPLE_ATTRIBUTE().

Ref: https://github.com/torvalds/linux/blob/v6.0-rc4/include/linux/fs.h#L3458-L3487

It looks much simpler to me. Untested, but it is something like this:

-----------------
static int page_owner_threshold_get(void *data, u64 *val)
{
	*val = threshold;
	return 0;
}

static int page_owner_threshold_set(void *data, u64 val)
{
	threshold = val;
	return 0;
}

DEFINE_SIMPLE_ATTRIBUTE(proc_page_owner_threshold, &page_owner_threshold_get,
			&page_owner_threshold_set, "%lu");
-----------------

And then the init should be the same:

	debugfs_create_file("page_owner_threshold", 0600, NULL, NULL,
			    &proc_page_owner_threshold);


-- 
Ammar Faizi



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-05  3:10 ` [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter Oscar Salvador
@ 2022-09-05 12:57   ` Marco Elver
  2022-09-05 13:00     ` Marco Elver
  2022-09-06  7:43     ` Oscar Salvador
  2022-09-05 22:20   ` kernel test robot
  1 sibling, 2 replies; 23+ messages in thread
From: Marco Elver @ 2022-09-05 12:57 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Mon, Sep 05, 2022 at 05:10AM +0200, Oscar Salvador wrote:
[...]
> +int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)

Can you add kernel-doc comment what this does (and also update
accordingly in 3/3 when you add 'threshold').

From what I see it prints *all* stacks that have a non-zero count.
Correct?

If so, should this be called stack_depot_print_all_count() (having
stack(s) in the name twice doesn't make it more obvious what it does)?
Then in the follow-up patch you add the 'threshold' arg.

> +{
> +	int i = *pos, ret = 0;
> +	struct stack_record **stacks, *stack;
> +	static struct stack_record *last = NULL;
> +	unsigned long stack_table_entries = stack_hash_mask + 1;
> +
> +	/* Continue from the last stack if we have one */
> +	if (last) {
> +		stack = last->next;

This is dead code?

> +	} else {
> +new_table:
> +		stacks = &stack_table[i];
> +		stack = (struct stack_record *)stacks;
> +	}
> +
> +	for (; stack; stack = stack->next) {
> +		if (!stack->size || stack->size < 0 ||
> +		    stack->size > size || stack->handle.valid != 1 ||
> +		    refcount_read(&stack->count) < 1)
> +			continue;
> +
> +		ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
> +		ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
> +				 refcount_read(&stack->count));
> +		last = stack;
> +		return ret;
> +	}
> +
> +	i++;
> +	*pos = i;
> +	last = NULL;
> +
> +	/* Keep looking all tables for valid stacks */
> +	if (i < stack_table_entries)
> +		goto new_table;
> +
> +	return 0;
> +}

Either I'm missing something really obvious, but I was able to simplify
the above function to just this (untested!):

	int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
	{
		const unsigned long stack_table_entries = stack_hash_mask + 1;

		/* Iterate over all tables for valid stacks. */
		for (; *pos < stack_table_entries; (*pos)++) {
			for (struct stack_record *stack = stack_table[*pos]; stack; stack = stack->next) {
				if (!stack->size || stack->size < 0 || stack->size > size ||
				    stack->handle.valid != 1 || refcount_read(&stack->count) < 1)
					continue;

				return stack_trace_snprint(buf, size, stack->entries, stack->size, 0) +
				       scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
						 refcount_read(&stack->count));
			}
		}

		return 0;
	}

> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 8730f377fa91..d88e6b4aefa0 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -664,6 +664,29 @@ static void init_early_allocated_pages(void)
>  		init_zones_in_node(pgdat);
>  }
>  
> +static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
> +				      size_t count, loff_t *pos)
> +{
> +	char *kbuf;
> +	int ret = 0;
> +
> +	count = min_t(size_t, count, PAGE_SIZE);
> +	kbuf = kmalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	ret += stack_depot_print_stacks_threshold(kbuf, count, pos);

If I understood right, this will print *all* stacks that have non-zero
count, and this isn't related to page_owner per-se. Correct?

This might not be a problem right now, but once there might be more
users that want to count stack usage, you'll end up with page_owner +
other stacks here.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-05 12:57   ` Marco Elver
@ 2022-09-05 13:00     ` Marco Elver
  2022-09-06  7:43     ` Oscar Salvador
  1 sibling, 0 replies; 23+ messages in thread
From: Marco Elver @ 2022-09-05 13:00 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Mon, Sep 05, 2022 at 02:57PM +0200, Marco Elver wrote:
[...]
> > +{
> > +	int i = *pos, ret = 0;
> > +	struct stack_record **stacks, *stack;
> > +	static struct stack_record *last = NULL;
> > +	unsigned long stack_table_entries = stack_hash_mask + 1;
> > +
> > +	/* Continue from the last stack if we have one */
> > +	if (last) {
> > +		stack = last->next;
> 
> This is dead code?

Oof, I just noticed that 'last' is static. Please avoid that, because
it'll make this interface really tricky to use safely. I still don't
quite understand why it needs to do this, and a kernel-doc comment would
make this clearer.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-09-05  3:10 ` [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
@ 2022-09-05 20:57   ` Andrey Konovalov
  2022-09-06  3:54     ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Andrey Konovalov @ 2022-09-05 20:57 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, LKML, Linux Memory Management List, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Alexander Potapenko

On Mon, Sep 5, 2022 at 5:10 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> We want to filter out page_owner output and print only those
> stacks that have been repeated beyond a certain threshold.
> This gives us the chance to get rid of a lot of noise.
> In order to do that, we need to keep track of how many repeated stacks
> (for allocation) do we have, so we add a new refcount_t field
> in the stack_record struct.
>
> Note that this might increase the size of the struct for some
> architectures.
> E.g: x86_64 is not affected due to alignment, but x86 32bits might.
>
> The alternative would be to have some kind of struct like this:
>
> struct track_stacks {
>         struct stack_record *stack;
>         struct track_stacks *next;
>         refcount_t stack_count;
>
> But ithat would imply to perform more allocations and glue everything
> together, which would make the code more complex, so I think that
> going with a new field in the struct stack_record is good enough.
>
> Note that on __set_page_owner_handle(), page_owner->handle is set,
> and on __reset_page_owner(), page_owner->free_handle is set.
>
> We are interested in page_owner->handle, so when __set_page_owner()
> gets called, we derive the stack_record struct from page_owner->handle,
> and we increment its refcount_t field; and when __reset_page_owner()
> gets called, we derive its stack_record from page_owner->handle()
> and we decrement its refcount_t field.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/stackdepot.h | 13 ++++++-
>  lib/stackdepot.c           | 79 +++++++++++++++++++++++++++++++-------
>  mm/kasan/common.c          |  3 +-
>  mm/page_owner.c            | 14 +++++--
>  4 files changed, 89 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index bc2797955de9..4e3a88f135ee 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -15,9 +15,16 @@
>
>  typedef u32 depot_stack_handle_t;
>
> +enum stack_depot_action {
> +       STACK_DEPOT_ACTION_NONE,
> +       STACK_DEPOT_ACTION_COUNT,
> +};

Hi Oscar,

Why do we need these actions? Why not just increment the refcount on
each stack trace save?

> +
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> -                                       gfp_t gfp_flags, bool can_alloc);
> +                                       gfp_t gfp_flags, bool can_alloc,
> +                                       enum stack_depot_action action);
> +void stack_depot_dec_count(depot_stack_handle_t handle);
>
>  /*
>   * Every user of stack depot has to call stack_depot_init() during its own init
> @@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void)       { return 0; }
>
>  depot_stack_handle_t stack_depot_save(unsigned long *entries,
>                                       unsigned int nr_entries, gfp_t gfp_flags);
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> +                                            unsigned int nr_entries,
> +                                            gfp_t gfp_flags,
> +                                            enum stack_depot_action action);
>
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>                                unsigned long **entries);
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index e73fda23388d..a806ef58a385 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -64,6 +64,7 @@ struct stack_record {
>         u32 hash;                       /* Hash in the hastable */
>         u32 size;                       /* Number of frames in the stack */
>         union handle_parts handle;
> +       refcount_t count;               /* Number of the same repeated stacks */
>         unsigned long entries[];        /* Variable-sized array of entries. */
>  };
>
> @@ -140,6 +141,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         stack->handle.slabindex = depot_index;
>         stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>         stack->handle.valid = 1;
> +       refcount_set(&stack->count, 1);
>         memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
>         depot_offset += required_size;
>
> @@ -341,6 +343,29 @@ void stack_depot_print(depot_stack_handle_t stack)
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_print);
>
> +static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
> +{
> +       union handle_parts parts = { .handle = handle };
> +       void *slab;
> +       size_t offset = parts.offset << STACK_ALLOC_ALIGN;
> +       struct stack_record *stack;
> +
> +       if(!handle)
> +               return NULL;
> +
> +       if (parts.slabindex > depot_index) {
> +               WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> +                    parts.slabindex, depot_index, handle);
> +               return NULL;
> +       }
> +       slab = stack_slabs[parts.slabindex];
> +       if (!slab)
> +               return NULL;
> +
> +       stack = slab + offset;
> +       return stack;
> +}
> +
>  /**
>   * stack_depot_fetch - Fetch stack entries from a depot
>   *
> @@ -353,30 +378,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>                                unsigned long **entries)
>  {
> -       union handle_parts parts = { .handle = handle };
> -       void *slab;
> -       size_t offset = parts.offset << STACK_ALLOC_ALIGN;
>         struct stack_record *stack;
>
>         *entries = NULL;
>         if (!handle)
>                 return 0;
>
> -       if (parts.slabindex > depot_index) {
> -               WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> -                       parts.slabindex, depot_index, handle);
> -               return 0;
> -       }
> -       slab = stack_slabs[parts.slabindex];
> -       if (!slab)
> +       stack = stack_depot_getstack(handle);
> +       if (!stack)
>                 return 0;
> -       stack = slab + offset;
>
>         *entries = stack->entries;
>         return stack->size;
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_fetch);
>
> +static void stack_depot_inc_count(struct stack_record *stack)
> +{
> +       refcount_inc(&stack->count);
> +}
> +
> +void stack_depot_dec_count(depot_stack_handle_t handle)
> +{
> +       struct stack_record *stack = NULL;
> +
> +       stack = stack_depot_getstack(handle);
> +       if (stack) {
> +               /*
> +                * page_owner creates some stacks via create_dummy_stack().
> +                * We are not interested in those, so make sure we only decrement
> +                * "valid" stacks.
> +                */
> +               if (refcount_read(&stack->count) > 1)
> +                       refcount_dec(&stack->count);
> +       }
> +}
> +
>  /**
>   * __stack_depot_save - Save a stack trace from an array
>   *
> @@ -402,7 +439,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   */
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> -                                       gfp_t alloc_flags, bool can_alloc)
> +                                       gfp_t alloc_flags, bool can_alloc,
> +                                       enum stack_depot_action action)
>  {
>         struct stack_record *found = NULL, **bucket;
>         depot_stack_handle_t retval = 0;
> @@ -488,8 +526,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 /* Nobody used this memory, ok to free it. */
>                 free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
>         }
> -       if (found)
> +       if (found) {
>                 retval = found->handle.handle;
> +               if (action == STACK_DEPOT_ACTION_COUNT)
> +                       stack_depot_inc_count(found);
> +       }
>  fast_exit:
>         return retval;
>  }
> @@ -511,6 +552,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>                                       unsigned int nr_entries,
>                                       gfp_t alloc_flags)
>  {
> -       return __stack_depot_save(entries, nr_entries, alloc_flags, true);
> +       return __stack_depot_save(entries, nr_entries, alloc_flags, true,
> +                                 STACK_DEPOT_ACTION_NONE);
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_save);
> +
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> +                                            unsigned int nr_entries,
> +                                            gfp_t alloc_flags,
> +                                            enum stack_depot_action action)
> +{
> +       return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
> +}
> +EXPORT_SYMBOL_GPL(stack_depot_save_action);
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 69f583855c8b..8077c6e70815 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
>         unsigned int nr_entries;
>
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> -       return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> +       return __stack_depot_save(entries, nr_entries, flags, can_alloc,
> +                                 STACK_DEPOT_ACTION_NONE);
>  }
>
>  void kasan_set_track(struct kasan_track *track, gfp_t flags)
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index e4c6f3f1695b..8730f377fa91 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c

Could you split out the stack depot and the page_owner changes into
separate patches?

> @@ -106,7 +106,8 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
>         return (void *)page_ext + page_owner_ops.offset;
>  }
>
> -static noinline depot_stack_handle_t save_stack(gfp_t flags)
> +static noinline depot_stack_handle_t save_stack(gfp_t flags,
> +                                               enum stack_depot_action action)
>  {
>         unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>         depot_stack_handle_t handle;
> @@ -125,7 +126,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
>         current->in_page_owner = 1;
>
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
> -       handle = stack_depot_save(entries, nr_entries, flags);
> +       handle = stack_depot_save_action(entries, nr_entries, flags, action);
>         if (!handle)
>                 handle = failure_handle;
>
> @@ -138,6 +139,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
>         int i;
>         struct page_ext *page_ext;
>         depot_stack_handle_t handle;
> +       depot_stack_handle_t alloc_handle;
>         struct page_owner *page_owner;
>         u64 free_ts_nsec = local_clock();
>
> @@ -145,7 +147,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
>         if (unlikely(!page_ext))
>                 return;
>
> -       handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> +       page_owner = get_page_owner(page_ext);
> +       alloc_handle = page_owner->handle;
> +
> +       handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
>         for (i = 0; i < (1 << order); i++) {
>                 __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
>                 page_owner = get_page_owner(page_ext);
> @@ -153,6 +158,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
>                 page_owner->free_ts_nsec = free_ts_nsec;
>                 page_ext = page_ext_next(page_ext);
>         }
> +       stack_depot_dec_count(alloc_handle);
>  }
>
>  static inline void __set_page_owner_handle(struct page_ext *page_ext,
> @@ -189,7 +195,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
>         if (unlikely(!page_ext))
>                 return;
>
> -       handle = save_stack(gfp_mask);
> +       handle = save_stack(gfp_mask, STACK_DEPOT_ACTION_COUNT);
>         __set_page_owner_handle(page_ext, handle, order, gfp_mask);
>  }
>
> --
> 2.35.3
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-05  3:10 ` [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter Oscar Salvador
  2022-09-05 12:57   ` Marco Elver
@ 2022-09-05 22:20   ` kernel test robot
  1 sibling, 0 replies; 23+ messages in thread
From: kernel test robot @ 2022-09-05 22:20 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: kbuild-all, Linux Memory Management List, linux-kernel,
	Michal Hocko, Vlastimil Babka, Eric Dumazet, Waiman Long,
	Suren Baghdasaryan, Marco Elver, Andrey Konovalov,
	Alexander Potapenko, Oscar Salvador

Hi Oscar,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.0-rc4]
[cannot apply to akpm-mm/mm-everything next-20220901]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/page_owner-print-stacks-and-their-counter/20220905-111139
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 7e18e42e4b280c85b76967a9106a13ca61c16179
config: microblaze-randconfig-m041-20220905 (https://download.01.org/0day-ci/archive/20220906/202209060655.H9CK1KgC-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 12.1.0

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

smatch warnings:
lib/stackdepot.c:586 stack_depot_print_stacks_threshold() warn: unsigned 'stack->size' is never less than zero.

vim +586 lib/stackdepot.c

   568	
   569	int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
   570	{
   571		int i = *pos, ret = 0;
   572		struct stack_record **stacks, *stack;
   573		static struct stack_record *last = NULL;
   574		unsigned long stack_table_entries = stack_hash_mask + 1;
   575	
   576		/* Continue from the last stack if we have one */
   577		if (last) {
   578			stack = last->next;
   579		} else {
   580	new_table:
   581			stacks = &stack_table[i];
   582			stack = (struct stack_record *)stacks;
   583		}
   584	
   585		for (; stack; stack = stack->next) {
 > 586			if (!stack->size || stack->size < 0 ||

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-09-05 20:57   ` Andrey Konovalov
@ 2022-09-06  3:54     ` Oscar Salvador
  2022-09-10 22:33       ` Andrey Konovalov
  0 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2022-09-06  3:54 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Andrew Morton, LKML, Linux Memory Management List, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Alexander Potapenko

On Mon, Sep 05, 2022 at 10:57:20PM +0200, Andrey Konovalov wrote:
> On Mon, Sep 5, 2022 at 5:10 AM Oscar Salvador <osalvador@suse.de> wrote:
> > +enum stack_depot_action {
> > +       STACK_DEPOT_ACTION_NONE,
> > +       STACK_DEPOT_ACTION_COUNT,
> > +};
> 
> Hi Oscar,

Hi Andrey

> Why do we need these actions? Why not just increment the refcount on
> each stack trace save?

Let me try to explain it.

Back in RFC, there were no actions and the refcount
was incremented/decremented in __set_page_ownwer()
and __reset_page_owner() functions.

This lead to a performance "problem", where you would
look for the stack twice, one when save it
and one when increment it.

We figured we could do better and, at least, for the __set_page_owner()
we could look just once for the stacktrace when calling __stack_depot_save,
and increment it directly there.

We cannot do that for __reset_page_owner(), because the stack we are
saving is the freeing stacktrace, and we are not interested in that.
That is why __reset_page_owner() does:

 <---
 depot_stack_handle_t alloc_handle;

 ...
 alloc_handle = page_owner->handle;
 handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
 page_owner->free_handle = handle
 stack_depot_dec_count(alloc_handle);
 --->

alloc_handle contains the handle for the allocation stacktrace, which was set
in __set_page_owner(), and page_owner->free handle contains the handle for the
freeing stacktrace.
But we are only interested in the allocation stack and we only want to increment/decrement
that on allocation/free.


> Could you split out the stack depot and the page_owner changes into
> separate patches?

I could, I am not sure if it would make the review any easier though,
as you could not match stackdepot <-> page_owner changes.

And we should be adding a bunch of code that would not be used till later on.
But I can try it out if there is a strong opinion.

thanks for your time!


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-05 12:57   ` Marco Elver
  2022-09-05 13:00     ` Marco Elver
@ 2022-09-06  7:43     ` Oscar Salvador
  2022-09-06  8:35       ` Marco Elver
  1 sibling, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2022-09-06  7:43 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Mon, Sep 05, 2022 at 02:57:50PM +0200, Marco Elver wrote:
> On Mon, Sep 05, 2022 at 05:10AM +0200, Oscar Salvador wrote:
> [...]
> > +int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
> 
> Can you add kernel-doc comment what this does (and also update
> accordingly in 3/3 when you add 'threshold').

Yes, I guess a kernel-doc comment is due.

> From what I see it prints *all* stacks that have a non-zero count.
> Correct?

That's right.

> If so, should this be called stack_depot_print_all_count() (having
> stack(s) in the name twice doesn't make it more obvious what it does)?
> Then in the follow-up patch you add the 'threshold' arg.

I guess so. The only reason I went with the actual name is that for me
"stack_depot" was kinda the name of the module/library, and
so I wanted to make crystal clear what were we printing.

But I'm ok with renaming it if it's already self-explanatory

> > +{
> > +	int i = *pos, ret = 0;
> > +	struct stack_record **stacks, *stack;
> > +	static struct stack_record *last = NULL;
> > +	unsigned long stack_table_entries = stack_hash_mask + 1;
> > +
> > +	/* Continue from the last stack if we have one */
> > +	if (last) {
> > +		stack = last->next;
> 
> This is dead code?

No, more below.

> Either I'm missing something really obvious, but I was able to simplify
> the above function to just this (untested!):
> 
> 	int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
> 	{
> 		const unsigned long stack_table_entries = stack_hash_mask + 1;
> 
> 		/* Iterate over all tables for valid stacks. */
> 		for (; *pos < stack_table_entries; (*pos)++) {
> 			for (struct stack_record *stack = stack_table[*pos]; stack; stack = stack->next) {
> 				if (!stack->size || stack->size < 0 || stack->size > size ||
> 				    stack->handle.valid != 1 || refcount_read(&stack->count) < 1)
> 					continue;
> 
> 				return stack_trace_snprint(buf, size, stack->entries, stack->size, 0) +
> 				       scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
> 						 refcount_read(&stack->count));
> 			}
> 		}
> 
> 		return 0;

Yes, this will not work.

You have stack_table[] which is an array for struct stacks, and each struct
stack has a pointer to its next stack which walks from the beginning fo a specific
table till the end. e.g:

stack_table[0] = {stack1, stack2, stack3, ...} (each linked by ->next)
stack_table[1] = {stack1, stack2, stack3, ...} (each linked by ->next)
..
stack_table[stack_table_entries - 1] = {stack1, stack2, stack3, ...} (each linked by ->next)

*pos holds the index of stack_table[], while "last" holds the last stack within
the table we were processing.

So, when we find a valid stack to print, set "last" to that stack, and *pos to the index
of stack_table.
So, when we call stack_depot_print_stacks_threshold() again, we set "stack" to "last"->next,
and we are ready to keep looking with:

 for (; stack; stack = stack->next) {
    ...
    check if stack is valid
 }

Should not we find any more valid stacks in that stack_table, we need to check in
the next table, so we do::

    i++; (note that i was set to *pos at the beginning of the function)
	*pos = i;
	last = NULL; 
    goto new_table

and now are ready to do:

new_table:
		stacks = &stack_table[i];
		stack = (struct stack_record *)stacks;


Does this clarify it a little bit?

About using static vs non-static.
In the v1, I was using a parameter which contained last_stack:

https://patchwork.kernel.org/project/linux-mm/patch/20220901044249.4624-3-osalvador@suse.de/

Not sure if that's better? Thoughts?


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-06  7:43     ` Oscar Salvador
@ 2022-09-06  8:35       ` Marco Elver
  2022-09-07  4:00         ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Marco Elver @ 2022-09-06  8:35 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Tue, 6 Sept 2022 at 09:44, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Mon, Sep 05, 2022 at 02:57:50PM +0200, Marco Elver wrote:
> > On Mon, Sep 05, 2022 at 05:10AM +0200, Oscar Salvador wrote:
> > [...]
> > > +int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
> >
> > Can you add kernel-doc comment what this does (and also update
> > accordingly in 3/3 when you add 'threshold').
>
> Yes, I guess a kernel-doc comment is due.
>
> > From what I see it prints *all* stacks that have a non-zero count.
> > Correct?
>
> That's right.
>
> > If so, should this be called stack_depot_print_all_count() (having
> > stack(s) in the name twice doesn't make it more obvious what it does)?
> > Then in the follow-up patch you add the 'threshold' arg.
>
> I guess so. The only reason I went with the actual name is that for me
> "stack_depot" was kinda the name of the module/library, and
> so I wanted to make crystal clear what were we printing.
>
> But I'm ok with renaming it if it's already self-explanatory

I think it's clear from the fact we're using the stack depot that any
printing will print stacks. To mirror the existing
'stack_depot_print()', I'd go with 'stack_depot_print_all_count()'.


> > > +{
> > > +   int i = *pos, ret = 0;
> > > +   struct stack_record **stacks, *stack;
> > > +   static struct stack_record *last = NULL;
> > > +   unsigned long stack_table_entries = stack_hash_mask + 1;
> > > +
> > > +   /* Continue from the last stack if we have one */
> > > +   if (last) {
> > > +           stack = last->next;
> >
> > This is dead code?
>
> No, more below.
>
> > Either I'm missing something really obvious, but I was able to simplify
> > the above function to just this (untested!):
> >
> >       int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
> >       {
> >               const unsigned long stack_table_entries = stack_hash_mask + 1;
> >
> >               /* Iterate over all tables for valid stacks. */
> >               for (; *pos < stack_table_entries; (*pos)++) {
> >                       for (struct stack_record *stack = stack_table[*pos]; stack; stack = stack->next) {
> >                               if (!stack->size || stack->size < 0 || stack->size > size ||
> >                                   stack->handle.valid != 1 || refcount_read(&stack->count) < 1)
> >                                       continue;
> >
> >                               return stack_trace_snprint(buf, size, stack->entries, stack->size, 0) +
> >                                      scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
> >                                                refcount_read(&stack->count));
> >                       }
> >               }
> >
> >               return 0;
>
> Yes, this will not work.
>
> You have stack_table[] which is an array for struct stacks, and each struct
> stack has a pointer to its next stack which walks from the beginning fo a specific
> table till the end. e.g:
>
> stack_table[0] = {stack1, stack2, stack3, ...} (each linked by ->next)
> stack_table[1] = {stack1, stack2, stack3, ...} (each linked by ->next)
> ..
> stack_table[stack_table_entries - 1] = {stack1, stack2, stack3, ...} (each linked by ->next)
>
> *pos holds the index of stack_table[], while "last" holds the last stack within
> the table we were processing.
>
> So, when we find a valid stack to print, set "last" to that stack, and *pos to the index
> of stack_table.
> So, when we call stack_depot_print_stacks_threshold() again, we set "stack" to "last"->next,
> and we are ready to keep looking with:
>
>  for (; stack; stack = stack->next) {
>     ...
>     check if stack is valid
>  }
>
> Should not we find any more valid stacks in that stack_table, we need to check in
> the next table, so we do::
>
>     i++; (note that i was set to *pos at the beginning of the function)
>         *pos = i;
>         last = NULL;
>     goto new_table
>
> and now are ready to do:
>
> new_table:
>                 stacks = &stack_table[i];
>                 stack = (struct stack_record *)stacks;
>
>
> Does this clarify it a little bit?
>
> About using static vs non-static.
> In the v1, I was using a parameter which contained last_stack:
>
> https://patchwork.kernel.org/project/linux-mm/patch/20220901044249.4624-3-osalvador@suse.de/
>
> Not sure if that's better? Thoughts?

Moderately better, but still not great. Essentially you need 2
cursors, but with loff_t you only get 1.

I think the loff_t parameter can be used to encode both cursors. In
the kernel, loff_t is always 'long long', so it'll always be 64-bit.

Let's assume that collisions in the hash table are rare, so the number
of stacks per bucket are typically small. Then you can encode the
index into the bucket in bits 0-31 and the bucket index in bits 32-63.
STACK_HASH_ORDER_MAX is 20, so 32 bits is plenty to encode the index.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-06  8:35       ` Marco Elver
@ 2022-09-07  4:00         ` Oscar Salvador
  2022-09-07  7:14           ` Marco Elver
  0 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2022-09-07  4:00 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Tue, Sep 06, 2022 at 10:35:00AM +0200, Marco Elver wrote:
> I think it's clear from the fact we're using the stack depot that any
> printing will print stacks. To mirror the existing
> 'stack_depot_print()', I'd go with 'stack_depot_print_all_count()'.

Fair enough, I will rename it then.
 
> Moderately better, but still not great. Essentially you need 2
> cursors, but with loff_t you only get 1.
> 
> I think the loff_t parameter can be used to encode both cursors. In
> the kernel, loff_t is always 'long long', so it'll always be 64-bit.
> 
> Let's assume that collisions in the hash table are rare, so the number
> of stacks per bucket are typically small. Then you can encode the
> index into the bucket in bits 0-31 and the bucket index in bits 32-63.
> STACK_HASH_ORDER_MAX is 20, so 32 bits is plenty to encode the index.

I see, I didn't think of it to be honest.

Then, the below (completely untested) should the trick:

<----
 int stack_depot_print_all_count(char *buf, size_t size, loff_t *pos)
 {
         int ret = 0, stack_i, table_i;
         struct stack_record **stacks, *stack;
         unsigned long stack_table_entries = stack_hash_mask + 1;
 
         stack_i = (*pos & 31);
         table_i = (*pos >> 32);
 new_table:
         stacks = &stack_table[table_i];
         stack = ((struct stack_record *)stacks) + stack_i;
 
         for (; stack; stack = stack->next, stack_i++) {
                 if (!stack->size || stack->size < 0 ||
                     stack->size > size || stack->handle.valid != 1 ||
                     refcount_read(&stack->count) < 1)
                         continue;
 
                 ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
                 ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
                                  refcount_read(&stack->count));
                 *pos |= stack_i;
                 *pos |= ((long long)table_i << 32);
                 return ret;
         }
 
         table_i++;
         /* Keep looking all tables for valid stacks */
         if (table_i < stack_table_entries)
                 goto new_table;
 
         return 0;
 }
---->

I will give it a go.

Thanks Marco!


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-07  4:00         ` Oscar Salvador
@ 2022-09-07  7:14           ` Marco Elver
  2022-09-08  3:32             ` Oscar Salvador
  0 siblings, 1 reply; 23+ messages in thread
From: Marco Elver @ 2022-09-07  7:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Wed, 7 Sept 2022 at 06:00, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Sep 06, 2022 at 10:35:00AM +0200, Marco Elver wrote:
> > I think it's clear from the fact we're using the stack depot that any
> > printing will print stacks. To mirror the existing
> > 'stack_depot_print()', I'd go with 'stack_depot_print_all_count()'.
>
> Fair enough, I will rename it then.
>
> > Moderately better, but still not great. Essentially you need 2
> > cursors, but with loff_t you only get 1.
> >
> > I think the loff_t parameter can be used to encode both cursors. In
> > the kernel, loff_t is always 'long long', so it'll always be 64-bit.
> >
> > Let's assume that collisions in the hash table are rare, so the number
> > of stacks per bucket are typically small. Then you can encode the
> > index into the bucket in bits 0-31 and the bucket index in bits 32-63.
> > STACK_HASH_ORDER_MAX is 20, so 32 bits is plenty to encode the index.
>
> I see, I didn't think of it to be honest.
>
> Then, the below (completely untested) should the trick:
>
> <----
>  int stack_depot_print_all_count(char *buf, size_t size, loff_t *pos)
>  {
>          int ret = 0, stack_i, table_i;
>          struct stack_record **stacks, *stack;
>          unsigned long stack_table_entries = stack_hash_mask + 1;
>
>          stack_i = (*pos & 31);
>          table_i = (*pos >> 32);
>  new_table:
>          stacks = &stack_table[table_i];
>          stack = ((struct stack_record *)stacks) + stack_i;

Why are you casting a stack_record** to a stack_record*? stack_table
is already appropriately typed, and there should be no need to cast
things around.

'stacks' is supposed to be the bucket? In which case you need to
dereference it to get the first entry in the bucket: bucket =
stack_table[table_i];

stack_i cannot be used to index into the bucket, because the elements
in it are a linked list and not necessarily adjacent in memory. You
have to traverse the linked list stack_i elements to get to the start:

  for (int i = 0; stack && i < stack_i; stack = stack->next, ++i);

then you can proceed with the below code.

>          for (; stack; stack = stack->next, stack_i++) {
>                  if (!stack->size || stack->size < 0 ||
>                      stack->size > size || stack->handle.valid != 1 ||
>                      refcount_read(&stack->count) < 1)
>                          continue;
>
>                  ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
>                  ret += scnprintf(buf + ret, size - ret, "stack count: %d\n\n",
>                                   refcount_read(&stack->count));
>                  *pos |= stack_i;
>                  *pos |= ((long long)table_i << 32);
>                  return ret;
>          }
>
>          table_i++;
>          /* Keep looking all tables for valid stacks */
>          if (table_i < stack_table_entries)
>                  goto new_table;

While you're at it, could you try to come up with a version that
avoids the goto?

>          return 0;
>  }
> ---->
>
> I will give it a go.
>
> Thanks Marco!
>
>
> --
> Oscar Salvador
> SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-07  7:14           ` Marco Elver
@ 2022-09-08  3:32             ` Oscar Salvador
  2022-09-08  5:31               ` Marco Elver
  0 siblings, 1 reply; 23+ messages in thread
From: Oscar Salvador @ 2022-09-08  3:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Wed, Sep 07, 2022 at 09:14:35AM +0200, Marco Elver wrote:
> Why are you casting a stack_record** to a stack_record*? stack_table
> is already appropriately typed, and there should be no need to cast
> things around.
> 
> 'stacks' is supposed to be the bucket? In which case you need to
> dereference it to get the first entry in the bucket: bucket =
> stack_table[table_i];
> 
> stack_i cannot be used to index into the bucket, because the elements
> in it are a linked list and not necessarily adjacent in memory. You
> have to traverse the linked list stack_i elements to get to the start:

Yea, I figured that much after thinking about more, but I was overly
eager.

>   for (int i = 0; stack && i < stack_i; stack = stack->next, ++i);

But this seems suboptimal.
With this code, we have to walk the list till we find the right index
every time we enter the function, while the actual code of v2
or even the patch from v1 [1], we do not really need to do that
because we already have the pointer to the stack.

So I much rather prefer that, than having to traverse the stacks
till the find the right one.



-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter
  2022-09-08  3:32             ` Oscar Salvador
@ 2022-09-08  5:31               ` Marco Elver
  0 siblings, 0 replies; 23+ messages in thread
From: Marco Elver @ 2022-09-08  5:31 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Andrey Konovalov, Alexander Potapenko

On Thu, 8 Sept 2022 at 05:32, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, Sep 07, 2022 at 09:14:35AM +0200, Marco Elver wrote:
> > Why are you casting a stack_record** to a stack_record*? stack_table
> > is already appropriately typed, and there should be no need to cast
> > things around.
> >
> > 'stacks' is supposed to be the bucket? In which case you need to
> > dereference it to get the first entry in the bucket: bucket =
> > stack_table[table_i];
> >
> > stack_i cannot be used to index into the bucket, because the elements
> > in it are a linked list and not necessarily adjacent in memory. You
> > have to traverse the linked list stack_i elements to get to the start:
>
> Yea, I figured that much after thinking about more, but I was overly
> eager.
>
> >   for (int i = 0; stack && i < stack_i; stack = stack->next, ++i);
>
> But this seems suboptimal.
> With this code, we have to walk the list till we find the right index
> every time we enter the function, while the actual code of v2
> or even the patch from v1 [1], we do not really need to do that
> because we already have the pointer to the stack.
>
> So I much rather prefer that, than having to traverse the stacks
> till the find the right one.

I would not prematurely optimize this. It's a hash map, and the only
problem is if there are tons of collisions. Also, this function isn't
performance critical, it's only used for printing, which itself is
slow.

I suggest you collect some stats how many entries each bucket has on
average. If the average is <10, I'd go with the cleaner interface.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-09-06  3:54     ` Oscar Salvador
@ 2022-09-10 22:33       ` Andrey Konovalov
  2022-09-19 15:01         ` Vlastimil Babka
  0 siblings, 1 reply; 23+ messages in thread
From: Andrey Konovalov @ 2022-09-10 22:33 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, LKML, Linux Memory Management List, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Alexander Potapenko

On Tue, Sep 6, 2022 at 5:54 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Mon, Sep 05, 2022 at 10:57:20PM +0200, Andrey Konovalov wrote:
> > On Mon, Sep 5, 2022 at 5:10 AM Oscar Salvador <osalvador@suse.de> wrote:
> > > +enum stack_depot_action {
> > > +       STACK_DEPOT_ACTION_NONE,
> > > +       STACK_DEPOT_ACTION_COUNT,
> > > +};
> >
> > Hi Oscar,
>
> Hi Andrey
>
> > Why do we need these actions? Why not just increment the refcount on
> > each stack trace save?
>
> Let me try to explain it.
>
> Back in RFC, there were no actions and the refcount
> was incremented/decremented in __set_page_ownwer()
> and __reset_page_owner() functions.
>
> This lead to a performance "problem", where you would
> look for the stack twice, one when save it
> and one when increment it.

I don't get this. If you increment the refcount at the same moment
when you save a stack trace, why do you need to do the lookup twice?

> We figured we could do better and, at least, for the __set_page_owner()
> we could look just once for the stacktrace when calling __stack_depot_save,
> and increment it directly there.

Exactly.

> We cannot do that for __reset_page_owner(), because the stack we are
> saving is the freeing stacktrace, and we are not interested in that.
> That is why __reset_page_owner() does:
>
>  <---
>  depot_stack_handle_t alloc_handle;
>
>  ...
>  alloc_handle = page_owner->handle;
>  handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
>  page_owner->free_handle = handle
>  stack_depot_dec_count(alloc_handle);
>  --->
>
> alloc_handle contains the handle for the allocation stacktrace, which was set
> in __set_page_owner(), and page_owner->free handle contains the handle for the
> freeing stacktrace.
> But we are only interested in the allocation stack and we only want to increment/decrement
> that on allocation/free.

But what is the problem with incrementing the refcount for free
stacktrace in __reset_page_owner? You save the stack there anyway.

Or is this because you don't want to see free stack traces when
filtering for bigger refcounts? I would argue that this is not a thing
stack depot should care about. If there's a refcount, it should
reflect the true number of references.

Perhaps, what you need is some kind of per-stack trace user metadata.
And the details of what format this metadata takes shouldn't be
present in the stack depot code.

> > Could you split out the stack depot and the page_owner changes into
> > separate patches?
>
> I could, I am not sure if it would make the review any easier though,
> as you could not match stackdepot <-> page_owner changes.
>
> And we should be adding a bunch of code that would not be used till later on.
> But I can try it out if there is a strong opinion.

Yes, splitting would be quite useful. It allows backporting stack
depot changes without having to backport any page_owner code. As long
as there are not breaking interface changes, of course.

Thanks!


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-09-10 22:33       ` Andrey Konovalov
@ 2022-09-19 15:01         ` Vlastimil Babka
  0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2022-09-19 15:01 UTC (permalink / raw)
  To: Andrey Konovalov, Oscar Salvador
  Cc: Andrew Morton, LKML, Linux Memory Management List, Michal Hocko,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Alexander Potapenko

On 9/11/22 00:33, Andrey Konovalov wrote:
>> We cannot do that for __reset_page_owner(), because the stack we are
>> saving is the freeing stacktrace, and we are not interested in that.
>> That is why __reset_page_owner() does:
>>
>>  <---
>>  depot_stack_handle_t alloc_handle;
>>
>>  ...
>>  alloc_handle = page_owner->handle;
>>  handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
>>  page_owner->free_handle = handle
>>  stack_depot_dec_count(alloc_handle);
>>  --->
>>
>> alloc_handle contains the handle for the allocation stacktrace, which was set
>> in __set_page_owner(), and page_owner->free handle contains the handle for the
>> freeing stacktrace.
>> But we are only interested in the allocation stack and we only want to increment/decrement
>> that on allocation/free.
> 
> But what is the problem with incrementing the refcount for free
> stacktrace in __reset_page_owner? You save the stack there anyway.
> 
> Or is this because you don't want to see free stack traces when
> filtering for bigger refcounts? I would argue that this is not a thing
> stack depot should care about. If there's a refcount, it should
> reflect the true number of references.

But to keep this really a "true number" we would have to now decrement the
counter when the page gets freed by a different stack trace, i.e. we replace
the previously stored free_handle with another one. Which is possible, but
seems like a waste of CPU?

> Perhaps, what you need is some kind of per-stack trace user metadata.
> And the details of what format this metadata takes shouldn't be
> present in the stack depot code.

I was thinking ideally we might want fully separate stackdepot storages
per-stack trace user, i.e. separate hashmaps and dynamically allocated
handles instead of the static "slabs" array. Different users tend to have
distinct stacks so that would make lookups more effective too. Only
CONFIG_STACKDEPOT_ALWAYS_INIT users (KASAN) would keep using the static
array due to their inherent limitations wrt dynamic allocations.

Then these different stackdepot instances could be optionally refcounted or
not, and if need be, have additional metadata as you suggest.

>> > Could you split out the stack depot and the page_owner changes into
>> > separate patches?
>>
>> I could, I am not sure if it would make the review any easier though,
>> as you could not match stackdepot <-> page_owner changes.
>>
>> And we should be adding a bunch of code that would not be used till later on.
>> But I can try it out if there is a strong opinion.
> 
> Yes, splitting would be quite useful. It allows backporting stack
> depot changes without having to backport any page_owner code. As long
> as there are not breaking interface changes, of course.
> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-09-05  3:10 ` [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
  2022-09-05 10:51   ` Ammar Faizi
@ 2022-09-19 15:23   ` Vlastimil Babka
  1 sibling, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2022-09-19 15:23 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Eric Dumazet, Waiman Long,
	Suren Baghdasaryan, Marco Elver, Andrey Konovalov,
	Alexander Potapenko

On 9/5/22 05:10, Oscar Salvador wrote:
> We want to be able to filter out the output on a threshold basis,
> in this way we can get rid of a lot of noise and focus only on those
> stacks which have an allegedly high counter.
> 
> We can control the threshold value by a new file called
> 'page_owner_threshold', which is 0 by default.

The name could suggest it has to do something with "page_owner" but in fact
it only affects "page_owner_stacks".
So maybe "page_owner_stacks_threshold" ? But now it's rather long. Or maybe
"page_owner_stacks_min_count" ? Also long but maybe the most self-evident?

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/stackdepot.h |  3 ++-
>  lib/stackdepot.c           |  6 +++--
>  mm/page_owner.c            | 51 +++++++++++++++++++++++++++++++++++++-
>  3 files changed, 56 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 19d3f8295df8..742038216cd0 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -25,7 +25,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					gfp_t gfp_flags, bool can_alloc,
>  					enum stack_depot_action action);
>  void stack_depot_dec_count(depot_stack_handle_t handle);
> -int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos);
> +int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
> +				       unsigned long threshold);
>  
>  /*
>   * Every user of stack depot has to call stack_depot_init() during its own init
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index a198b2dbe3fb..a31e882853ab 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -566,7 +566,8 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_save_action);
>  
> -int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
> +int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos,
> +				       unsigned long threshold)
>  {
>  	int i = *pos, ret = 0;
>  	struct stack_record **stacks, *stack;
> @@ -585,7 +586,8 @@ int stack_depot_print_stacks_threshold(char *buf, size_t size, loff_t *pos)
>  	for (; stack; stack = stack->next) {
>  		if (!stack->size || stack->size < 0 ||
>  		    stack->size > size || stack->handle.valid != 1 ||
> -		    refcount_read(&stack->count) < 1)
> +		    refcount_read(&stack->count) < 1 ||
> +		    refcount_read(&stack->count) < threshold)
>  			continue;
>  
>  		ret += stack_trace_snprint(buf, size, stack->entries, stack->size, 0);
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index d88e6b4aefa0..5b895d347c5f 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -43,6 +43,8 @@ static depot_stack_handle_t early_handle;
>  
>  static void init_early_allocated_pages(void);
>  
> +static unsigned long threshold;
> +
>  static int __init early_page_owner_param(char *buf)
>  {
>  	int ret = kstrtobool(buf, &page_owner_enabled);
> @@ -675,7 +677,7 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
>  	if (!kbuf)
>  		return -ENOMEM;
>  
> -	ret += stack_depot_print_stacks_threshold(kbuf, count, pos);
> +	ret += stack_depot_print_stacks_threshold(kbuf, count, pos, threshold);
>  	if (copy_to_user(buf, kbuf, ret))
>  		ret = -EFAULT;
>  
> @@ -683,6 +685,51 @@ static ssize_t read_page_owner_stacks(struct file *file, char __user *buf,
>  	return ret;
>  }
>  
> +static int page_owner_threshold_show(struct seq_file *p, void *v)
> +{
> +	 seq_printf(p, "%lu\n", threshold);
> +	return 0;
> +}
> +
> +static ssize_t write_page_owner_threshold(struct file *file, const char __user *buf,
> +					  size_t count, loff_t *pos)
> +{
> +	char *kbuf;
> +	int ret = 0;
> +
> +	count = min_t(size_t, count, PAGE_SIZE);
> +	kbuf = kmalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(kbuf, buf, count)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	kbuf[count - 1] = '\0';
> +
> +	ret = kstrtoul(kbuf, 10, &threshold);
> +
> +out:
> +	kfree(kbuf);
> +	return ret ? ret : count;
> +}
> +
> +static int open_page_owner_threshold(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, page_owner_threshold_show, NULL);
> +}
> +
> +
> +static const struct file_operations proc_page_owner_threshold = {
> +	.open = open_page_owner_threshold,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.write = write_page_owner_threshold,
> +	.release = single_release,
> +};
> +
>  static const struct file_operations proc_page_owner_stacks = {
>  	.read = read_page_owner_stacks,
>  };
> @@ -702,6 +749,8 @@ static int __init pageowner_init(void)
>  			    &proc_page_owner_operations);
>  	debugfs_create_file("page_owner_stacks", 0400, NULL, NULL,
>  			    &proc_page_owner_stacks);
> +	debugfs_create_file("page_owner_threshold", 0600, NULL, NULL,
> +			     &proc_page_owner_threshold);
>  
>  	return 0;
>  }



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-09-19 15:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-05  3:10 [PATCH v2 0/3] page_owner: print stacks and their counter Oscar Salvador
2022-09-05  3:10 ` [PATCH v2 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
2022-09-05 20:57   ` Andrey Konovalov
2022-09-06  3:54     ` Oscar Salvador
2022-09-10 22:33       ` Andrey Konovalov
2022-09-19 15:01         ` Vlastimil Babka
2022-09-05  3:10 ` [PATCH v2 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counter Oscar Salvador
2022-09-05 12:57   ` Marco Elver
2022-09-05 13:00     ` Marco Elver
2022-09-06  7:43     ` Oscar Salvador
2022-09-06  8:35       ` Marco Elver
2022-09-07  4:00         ` Oscar Salvador
2022-09-07  7:14           ` Marco Elver
2022-09-08  3:32             ` Oscar Salvador
2022-09-08  5:31               ` Marco Elver
2022-09-05 22:20   ` kernel test robot
2022-09-05  3:10 ` [PATCH v2 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
2022-09-05 10:51   ` Ammar Faizi
2022-09-05 11:31     ` Michal Hocko
2022-09-05 11:54       ` Ammar Faizi
2022-09-05 12:02         ` Michal Hocko
2022-09-05 12:42           ` Ammar Faizi
2022-09-19 15:23   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).