linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] page_owner: print stacks and their counter
@ 2022-11-04  9:33 Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Oscar Salvador @ 2022-11-04  9:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

Changes v2 -> v3:
     - Replace interface in favor of seq operations (suggested by Vlastimil)
     - Use debugfs interface to store/read valued (suggested by Ammar)

Hi,

page_owner is a great debug functionality tool that gets us to know
about all pages that have been allocated/freed and their stacktrace.
This comes very handy when e.g: debugging leaks, as with some scripting
we might be able to see those stacktraces that are allocating pages
but not freeing theme.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages aand try to reconstruct the
stack <-> allocated/freed relationship. There is a lot of noise
to cancel off.

This patch aims to fix that by adding a new functionality into page_owner.
What this does is to create a new read-only file "page_owner_stacks",
which prints only the allocating stacktraces and their counting, being that
the times the stacktrace has allocated - the times it has freed.

So we have a clear overview of stacks <-> allocated/freed relationship
without the need to fiddle with pages and trying to match free stacktraces
with allocated stacktraces.

This is achieved by adding a new refcount_t field in the stack_record struct,
incrementing that refcount_t everytime the same stacktrace allocates,
and decrementing it when it frees a page. Details can be seen in the
respective patches.

We also create another file called "page_owner_threshold", which let us
specify a threshold, so when when reading from "page_owner_stacks",
we will only see those stacktraces which counting goes beyond the
threshold we specified.

One thing I am not completely happy about is to polute lib/stackdepot.c file
with the stack_* functions.
We could sort that out if the stack_record struct definitions were in a header
file instead of stackdepot.c.
But I am not sure about that trade-off, so suggestions are accepted.

A PoC can be found below:

# cat /sys/kernel/debug/page_owner_threshold
 0
# cat /sys/kernel/debug/page_owner_stacks > stacks_full.txt
# head -32 stacks_full.txt
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 alloc_page_interleave+0x13/0x90
 new_slab+0x31d/0x530
 ___slab_alloc+0x5d7/0x720
 __slab_alloc.isra.85+0x4a/0x90
 kmem_cache_alloc+0x455/0x4a0
 acpi_ps_alloc_op+0x57/0x8f
 acpi_ps_create_scope_op+0x12/0x23
 acpi_ps_execute_method+0x102/0x2c1
 acpi_ns_evaluate+0x343/0x4da
 acpi_evaluate_object+0x1cb/0x392
 acpi_run_osc+0x135/0x260
 acpi_init+0x165/0x4ed
 do_one_initcall+0x3e/0x200
stack count: 2

 free_pcp_prepare+0x287/0x5c0
 free_unref_page+0x1c/0xd0
 __mmdrop+0x50/0x160
 finish_task_switch+0x249/0x2b0
 __schedule+0x2c3/0x960
 schedule+0x44/0xb0
 futex_wait_queue+0x70/0xd0
 futex_wait+0x160/0x250
 do_futex+0x11c/0x1b0
 __x64_sys_futex+0x5e/0x1d0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 1

 

# echo 10000 > /sys/kernel/debug/page_owner_threshold
# cat /sys/kernel/debug/page_owner_stacks > stacks_10000.txt
# cat stacks_10000.txt 
 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 __kernel_read+0x136/0x2f0
 kernel_read_file+0x197/0x2d0
 kernel_read_file_from_fd+0x54/0x90
 __do_sys_finit_module+0x89/0x120
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 36195

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0x23d/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 44484

 prep_new_page+0x10d/0x180
 get_page_from_freelist+0x1bd6/0x1e10
 __alloc_pages+0x194/0x360
 folio_alloc+0x17/0x40
 page_cache_ra_unbounded+0x96/0x170
 filemap_get_pages+0xdd/0x5e0
 filemap_read+0xbf/0x3a0
 new_sync_read+0x106/0x180
 vfs_read+0x16f/0x190
 ksys_read+0xa5/0xe0
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
stack count: 17874

Oscar Salvador (3):
  lib/stackdepot: Add a refcount field in stack_record
  mm, page_owner: Add page_owner_stacks file to print out only stacks
    and their counte
  mm,page_owner: Filter out stacks by a threshold counter

 include/linux/stackdepot.h |  21 ++++-
 lib/stackdepot.c           | 167 +++++++++++++++++++++++++++++++++----
 mm/kasan/common.c          |   3 +-
 mm/page_owner.c            |  48 ++++++++++-
 4 files changed, 219 insertions(+), 20 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-11-04  9:33 [PATCH v3 0/3] page_owner: print stacks and their counter Oscar Salvador
@ 2022-11-04  9:33 ` Oscar Salvador
  2022-11-04 16:14   ` Alexander Potapenko
  2022-11-04  9:33 ` [PATCH v3 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counte Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
  2 siblings, 1 reply; 5+ messages in thread
From: Oscar Salvador @ 2022-11-04  9:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We want to filter out page_owner output and print only those
stacks that have been repeated beyond a certain threshold.
This gives us the chance to get rid of a lot of noise.
In order to do that, we need to keep track of how many repeated stacks
(for allocation) do we have, so we add a new refcount_t field
in the stack_record struct.

Note that this might increase the size of the struct for some
architectures.
E.g: x86_64 is not affected due to alignment, but x86 32bits might.

The alternative would be to have some kind of struct like this:

struct track_stacks {
	struct stack_record *stack;
	struct track_stacks *next;
	refcount_t stack_count;

But ithat would imply to perform more allocations and glue everything
together, which would make the code more complex, so I think that
going with a new field in the struct stack_record is good enough.

Note that on __set_page_owner_handle(), page_owner->handle is set,
and on __reset_page_owner(), page_owner->free_handle is set.

We are interested in page_owner->handle, so when __set_page_owner()
gets called, we derive the stack_record struct from page_owner->handle,
and we increment its refcount_t field; and when __reset_page_owner()
gets called, we derive its stack_record from page_owner->handle()
and we decrement its refcount_t field.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h | 13 ++++++-
 lib/stackdepot.c           | 79 +++++++++++++++++++++++++++++++-------
 mm/kasan/common.c          |  3 +-
 mm/page_owner.c            | 14 +++++--
 4 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index bc2797955de9..4e3a88f135ee 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,9 +15,16 @@
 
 typedef u32 depot_stack_handle_t;
 
+enum stack_depot_action {
+	STACK_DEPOT_ACTION_NONE,
+	STACK_DEPOT_ACTION_COUNT,
+};
+
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
-					gfp_t gfp_flags, bool can_alloc);
+					gfp_t gfp_flags, bool can_alloc,
+					enum stack_depot_action action);
+void stack_depot_dec_count(depot_stack_handle_t handle);
 
 /*
  * Every user of stack depot has to call stack_depot_init() during its own init
@@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void)	{ return 0; }
 
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries, gfp_t gfp_flags);
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+					     unsigned int nr_entries,
+					     gfp_t gfp_flags,
+					     enum stack_depot_action action);
 
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries);
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index e73fda23388d..a806ef58a385 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -64,6 +64,7 @@ struct stack_record {
 	u32 hash;			/* Hash in the hastable */
 	u32 size;			/* Number of frames in the stack */
 	union handle_parts handle;
+	refcount_t count;		/* Number of the same repeated stacks */
 	unsigned long entries[];	/* Variable-sized array of entries. */
 };
 
@@ -140,6 +141,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	stack->handle.slabindex = depot_index;
 	stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
 	stack->handle.valid = 1;
+	refcount_set(&stack->count, 1);
 	memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
 	depot_offset += required_size;
 
@@ -341,6 +343,29 @@ void stack_depot_print(depot_stack_handle_t stack)
 }
 EXPORT_SYMBOL_GPL(stack_depot_print);
 
+static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
+{
+	union handle_parts parts = { .handle = handle };
+	void *slab;
+	size_t offset = parts.offset << STACK_ALLOC_ALIGN;
+	struct stack_record *stack;
+
+	if(!handle)
+		return NULL;
+
+	if (parts.slabindex > depot_index) {
+		WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
+		     parts.slabindex, depot_index, handle);
+		return NULL;
+	}
+	slab = stack_slabs[parts.slabindex];
+	if (!slab)
+		return NULL;
+
+	stack = slab + offset;
+	return stack;
+}
+
 /**
  * stack_depot_fetch - Fetch stack entries from a depot
  *
@@ -353,30 +378,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries)
 {
-	union handle_parts parts = { .handle = handle };
-	void *slab;
-	size_t offset = parts.offset << STACK_ALLOC_ALIGN;
 	struct stack_record *stack;
 
 	*entries = NULL;
 	if (!handle)
 		return 0;
 
-	if (parts.slabindex > depot_index) {
-		WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
-			parts.slabindex, depot_index, handle);
-		return 0;
-	}
-	slab = stack_slabs[parts.slabindex];
-	if (!slab)
+	stack = stack_depot_getstack(handle);
+	if (!stack)
 		return 0;
-	stack = slab + offset;
 
 	*entries = stack->entries;
 	return stack->size;
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
+static void stack_depot_inc_count(struct stack_record *stack)
+{
+	refcount_inc(&stack->count);
+}
+
+void stack_depot_dec_count(depot_stack_handle_t handle)
+{
+	struct stack_record *stack = NULL;
+
+	stack = stack_depot_getstack(handle);
+	if (stack) {
+		/*
+		 * page_owner creates some stacks via create_dummy_stack().
+		 * We are not interested in those, so make sure we only decrement
+		 * "valid" stacks.
+		 */
+		if (refcount_read(&stack->count) > 1)
+			refcount_dec(&stack->count);
+	}
+}
+
 /**
  * __stack_depot_save - Save a stack trace from an array
  *
@@ -402,7 +439,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
  */
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
-					gfp_t alloc_flags, bool can_alloc)
+					gfp_t alloc_flags, bool can_alloc,
+					enum stack_depot_action action)
 {
 	struct stack_record *found = NULL, **bucket;
 	depot_stack_handle_t retval = 0;
@@ -488,8 +526,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		/* Nobody used this memory, ok to free it. */
 		free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
 	}
-	if (found)
+	if (found) {
 		retval = found->handle.handle;
+		if (action == STACK_DEPOT_ACTION_COUNT)
+			stack_depot_inc_count(found);
+	}
 fast_exit:
 	return retval;
 }
@@ -511,6 +552,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries,
 				      gfp_t alloc_flags)
 {
-	return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+	return __stack_depot_save(entries, nr_entries, alloc_flags, true,
+				  STACK_DEPOT_ACTION_NONE);
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
+
+depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
+					     unsigned int nr_entries,
+					     gfp_t alloc_flags,
+					     enum stack_depot_action action)
+{
+	return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
+}
+EXPORT_SYMBOL_GPL(stack_depot_save_action);
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 69f583855c8b..8077c6e70815 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
 	unsigned int nr_entries;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
-	return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+	return __stack_depot_save(entries, nr_entries, flags, can_alloc,
+				  STACK_DEPOT_ACTION_NONE);
 }
 
 void kasan_set_track(struct kasan_track *track, gfp_t flags)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index e4c6f3f1695b..8730f377fa91 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -106,7 +106,8 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
 	return (void *)page_ext + page_owner_ops.offset;
 }
 
-static noinline depot_stack_handle_t save_stack(gfp_t flags)
+static noinline depot_stack_handle_t save_stack(gfp_t flags,
+						enum stack_depot_action action)
 {
 	unsigned long entries[PAGE_OWNER_STACK_DEPTH];
 	depot_stack_handle_t handle;
@@ -125,7 +126,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	current->in_page_owner = 1;
 
 	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
-	handle = stack_depot_save(entries, nr_entries, flags);
+	handle = stack_depot_save_action(entries, nr_entries, flags, action);
 	if (!handle)
 		handle = failure_handle;
 
@@ -138,6 +139,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
 	int i;
 	struct page_ext *page_ext;
 	depot_stack_handle_t handle;
+	depot_stack_handle_t alloc_handle;
 	struct page_owner *page_owner;
 	u64 free_ts_nsec = local_clock();
 
@@ -145,7 +147,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
 	if (unlikely(!page_ext))
 		return;
 
-	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+	page_owner = get_page_owner(page_ext);
+	alloc_handle = page_owner->handle;
+
+	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
 	for (i = 0; i < (1 << order); i++) {
 		__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
 		page_owner = get_page_owner(page_ext);
@@ -153,6 +158,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
 		page_owner->free_ts_nsec = free_ts_nsec;
 		page_ext = page_ext_next(page_ext);
 	}
+	stack_depot_dec_count(alloc_handle);
 }
 
 static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -189,7 +195,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	if (unlikely(!page_ext))
 		return;
 
-	handle = save_stack(gfp_mask);
+	handle = save_stack(gfp_mask, STACK_DEPOT_ACTION_COUNT);
 	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counte
  2022-11-04  9:33 [PATCH v3 0/3] page_owner: print stacks and their counter Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
@ 2022-11-04  9:33 ` Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador
  2 siblings, 0 replies; 5+ messages in thread
From: Oscar Salvador @ 2022-11-04  9:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We might be only interested in knowing about stacks <-> count
relationship, so instead of having to fiddle with page_owner
output and screen through pfns, let us add a new file called
'page_owner_stacks' that does just that.
By cating such file, we will get all the stacktraces followed by
its counter, so we can have a more global view.

Signed-off-by: Oscar Salvador <osalvador@suse.de
---
 include/linux/stackdepot.h |  5 +++
 lib/stackdepot.c           | 73 ++++++++++++++++++++++++++++++++++++++
 mm/page_owner.c            | 29 +++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 4e3a88f135ee..ca048a79bf7c 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -66,6 +66,11 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 					     unsigned int nr_entries,
 					     gfp_t gfp_flags,
 					     enum stack_depot_action action);
+#ifdef CONFIG_PAGE_OWNER
+void *stack_start(struct seq_file *m, loff_t *ppos);
+void *stack_next(struct seq_file *m, void *v, loff_t *ppos);
+int stack_print(struct seq_file *m, void *v);
+#endif
 
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries);
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index a806ef58a385..97e5dce40f4b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -33,6 +33,8 @@
 #include <linux/types.h>
 #include <linux/memblock.h>
 #include <linux/kasan-enabled.h>
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
 
 #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
 
@@ -565,3 +567,74 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 	return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
 }
 EXPORT_SYMBOL_GPL(stack_depot_save_action);
+
+#ifdef CONFIG_PAGE_OWNER
+void *stack_start(struct seq_file *m, loff_t *ppos)
+{
+	unsigned long *table = m->private;
+	struct stack_record **stacks, *stack;
+
+	/* First time */
+	if (*ppos == 0)
+		*table = 0;
+
+	if (*ppos == -1UL)
+		return NULL;
+
+	stacks = &stack_table[*table];
+	stack = (struct stack_record *)stacks;
+
+	return stack;
+}
+
+void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
+{
+	unsigned long *table = m->private;
+	unsigned long nr_table = *table;
+	struct stack_record *next = NULL, *stack = v, **stacks;
+	unsigned long stack_table_entries = stack_hash_mask + 1;
+
+	if (!stack) {
+new_table:
+		/* New table */
+		nr_table++;
+		if (nr_table >= stack_table_entries)
+			goto out;
+		stacks = &stack_table[nr_table];
+		stack = (struct stack_record *)stacks;
+		next = stack;
+	} else {
+		next = stack->next;
+	}
+
+	if (!next)
+		goto new_table;
+
+out:
+	*table = nr_table;
+	*ppos = (nr_table >= stack_table_entries) ? -1UL : *ppos + 1;
+	return next;
+}
+
+int stack_print(struct seq_file *m, void *v)
+{
+	char *buf;
+	int ret = 0;
+	struct stack_record *stack =v;
+
+	if (!stack->size || stack->size < 0 ||
+	    stack->size > PAGE_SIZE || stack->handle.valid != 1 ||
+	    refcount_read(&stack->count) < 1)
+		return 0;
+
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	ret += stack_trace_snprint(buf, PAGE_SIZE, stack->entries, stack->size, 0);
+	scnprintf(buf + ret, PAGE_SIZE - ret, "stack count: %d\n\n",
+		  refcount_read(&stack->count));
+	seq_printf(m, buf);
+	seq_puts(m, "\n\n");
+	kfree(buf);
+
+        return 0;
+}
+#endif
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8730f377fa91..3808c0985060 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -668,6 +668,32 @@ static const struct file_operations proc_page_owner_operations = {
 	.read		= read_page_owner,
 };
 
+static void stack_stop(struct seq_file *m, void *v)
+{
+	return;
+}
+
+static const struct seq_operations page_owner_stack_op = {
+	.start  = stack_start,
+	.next   = stack_next,
+	.stop   = stack_stop,
+	.show   = stack_print
+};
+
+static int page_owner_stack_open(struct inode *inode, struct file *file)
+{
+	return seq_open_private(file, &page_owner_stack_op,
+				sizeof(unsigned long));
+}
+
+
+const struct file_operations page_owner_stack_operations = {
+	.open           = page_owner_stack_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = seq_release,
+};
+
 static int __init pageowner_init(void)
 {
 	if (!static_branch_unlikely(&page_owner_inited)) {
@@ -678,6 +704,9 @@ static int __init pageowner_init(void)
 	debugfs_create_file("page_owner", 0400, NULL, NULL,
 			    &proc_page_owner_operations);
 
+	debugfs_create_file("page_owner_stacks", S_IRUSR, NULL, NULL,
+			     &page_owner_stack_operations);
+
 	return 0;
 }
 late_initcall(pageowner_init)
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 3/3] mm,page_owner: Filter out stacks by a threshold counter
  2022-11-04  9:33 [PATCH v3 0/3] page_owner: print stacks and their counter Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
  2022-11-04  9:33 ` [PATCH v3 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counte Oscar Salvador
@ 2022-11-04  9:33 ` Oscar Salvador
  2 siblings, 0 replies; 5+ messages in thread
From: Oscar Salvador @ 2022-11-04  9:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Eric Dumazet, Waiman Long, Suren Baghdasaryan, Marco Elver,
	Andrey Konovalov, Alexander Potapenko, Oscar Salvador

We want to be able to filter out the output on a threshold basis,
in this way we can get rid of a lot of noise and focus only on those
stacks which have an allegedly high counter.

We can control the threshold value by a new file called
'page_owner_threshold', which is 0 by default.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h |  3 +++
 lib/stackdepot.c           | 17 ++++++++++++++++-
 mm/page_owner.c            |  5 +++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index ca048a79bf7c..c998b8024534 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -70,6 +70,9 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 void *stack_start(struct seq_file *m, loff_t *ppos);
 void *stack_next(struct seq_file *m, void *v, loff_t *ppos);
 int stack_print(struct seq_file *m, void *v);
+
+int page_owner_threshold_get(void *data, u64 *val);
+int page_owner_threshold_set(void *data, u64 val);
 #endif
 
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 97e5dce40f4b..b70d081d7c61 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -569,6 +569,9 @@ depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
 EXPORT_SYMBOL_GPL(stack_depot_save_action);
 
 #ifdef CONFIG_PAGE_OWNER
+
+static unsigned long page_owner_stack_threshold = 0;
+
 void *stack_start(struct seq_file *m, loff_t *ppos)
 {
 	unsigned long *table = m->private;
@@ -624,7 +627,7 @@ int stack_print(struct seq_file *m, void *v)
 
 	if (!stack->size || stack->size < 0 ||
 	    stack->size > PAGE_SIZE || stack->handle.valid != 1 ||
-	    refcount_read(&stack->count) < 1)
+	    refcount_read(&stack->count) < page_owner_stack_threshold)
 		return 0;
 
 	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
@@ -637,4 +640,16 @@ int stack_print(struct seq_file *m, void *v)
 
         return 0;
 }
+
+int page_owner_threshold_get(void *data, u64 *val)
+{
+        *val = page_owner_stack_threshold;
+        return 0;
+}
+
+int page_owner_threshold_set(void *data, u64 val)
+{
+        page_owner_stack_threshold = val;
+        return 0;
+}
 #endif
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 3808c0985060..3e05ebd25ba0 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -694,6 +694,9 @@ const struct file_operations page_owner_stack_operations = {
 	.release        = seq_release,
 };
 
+DEFINE_SIMPLE_ATTRIBUTE(proc_page_owner_threshold, &page_owner_threshold_get,
+                        &page_owner_threshold_set, "%lu");
+
 static int __init pageowner_init(void)
 {
 	if (!static_branch_unlikely(&page_owner_inited)) {
@@ -706,6 +709,8 @@ static int __init pageowner_init(void)
 
 	debugfs_create_file("page_owner_stacks", S_IRUSR, NULL, NULL,
 			     &page_owner_stack_operations);
+	debugfs_create_file("page_owner_threshold", 0600, NULL, NULL,
+			    &proc_page_owner_threshold);
 
 	return 0;
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record
  2022-11-04  9:33 ` [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
@ 2022-11-04 16:14   ` Alexander Potapenko
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Potapenko @ 2022-11-04 16:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Eric Dumazet, Waiman Long, Suren Baghdasaryan,
	Marco Elver, Andrey Konovalov

On Fri, Nov 4, 2022 at 10:34 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> We want to filter out page_owner output and print only those
> stacks that have been repeated beyond a certain threshold.
> This gives us the chance to get rid of a lot of noise.
> In order to do that, we need to keep track of how many repeated stacks
> (for allocation) do we have, so we add a new refcount_t field
> in the stack_record struct.
>
> Note that this might increase the size of the struct for some
> architectures.
> E.g: x86_64 is not affected due to alignment, but x86 32bits might.
>
> The alternative would be to have some kind of struct like this:
>
> struct track_stacks {
>         struct stack_record *stack;
>         struct track_stacks *next;
>         refcount_t stack_count;
>
> But ithat would imply to perform more allocations and glue everything
> together, which would make the code more complex, so I think that
> going with a new field in the struct stack_record is good enough.
>
> Note that on __set_page_owner_handle(), page_owner->handle is set,
> and on __reset_page_owner(), page_owner->free_handle is set.
>
> We are interested in page_owner->handle, so when __set_page_owner()
> gets called, we derive the stack_record struct from page_owner->handle,
> and we increment its refcount_t field; and when __reset_page_owner()
> gets called, we derive its stack_record from page_owner->handle()
> and we decrement its refcount_t field.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/stackdepot.h | 13 ++++++-
>  lib/stackdepot.c           | 79 +++++++++++++++++++++++++++++++-------
>  mm/kasan/common.c          |  3 +-
>  mm/page_owner.c            | 14 +++++--
>  4 files changed, 89 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index bc2797955de9..4e3a88f135ee 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -15,9 +15,16 @@
>
>  typedef u32 depot_stack_handle_t;
>
> +enum stack_depot_action {
> +       STACK_DEPOT_ACTION_NONE,
> +       STACK_DEPOT_ACTION_COUNT,
> +};

For the sake of simplicity, can it be just a boolean? What other kinds
of actions do you anticipate?
Another question is maybe we can afford incrementing the counter
unconditionally? It costs almost nothing compared to stack
unwinding/hashing, and this is a debug-only feature anyway (right?)

>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> -                                       gfp_t gfp_flags, bool can_alloc);
> +                                       gfp_t gfp_flags, bool can_alloc,
> +                                       enum stack_depot_action action);
> +void stack_depot_dec_count(depot_stack_handle_t handle);
>
>  /*
>   * Every user of stack depot has to call stack_depot_init() during its own init
> @@ -55,6 +62,10 @@ static inline int stack_depot_early_init(void)       { return 0; }
>
>  depot_stack_handle_t stack_depot_save(unsigned long *entа, всеries,
>                                       unsigned int nr_entries, gfp_t gfp_flags);
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> +                                            unsigned int nr_entries,
> +                                            gfp_t gfp_flags,
> +                                            enum stack_depot_action action);
>
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>                                unsigned long **entries);
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index e73fda23388d..a806ef58a385 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -64,6 +64,7 @@ struct stack_record {
>         u32 hash;                       /* Hash in the hastable */
>         u32 size;                       /* Number of frames in the stack */
>         union handle_parts handle;
> +       refcount_t count;               /* Number of the same repeated stacks */
>         unsigned long entries[];        /* Variable-sized array of entries. */
>  };
>
> @@ -140,6 +141,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         stack->handle.slabindex = depot_index;
>         stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>         stack->handle.valid = 1;
> +       refcount_set(&stack->count, 1);

If we decide to only maintain the refcount for the page owner,
wouldn't it be simpler to decouple the counter increment from stack
allocation?
In that case other users won't need to be updated, and page owner will
just get an extra call to stack_depot_inc_count().

>         memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
>         depot_offset += required_size;
>
> @@ -341,6 +343,29 @@ void stack_depot_print(depot_stack_handle_t stack)
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_print);
>
> +static struct stack_record *stack_depot_getstack(depot_stack_handle_t handle)
> +{
> +       union handle_parts parts = { .handle = handle };
> +       void *slab;
> +       size_t offset = parts.offset << STACK_ALLOC_ALIGN;
> +       struct stack_record *stack;
> +
> +       if(!handle)
> +               return NULL;
> +
> +       if (parts.slabindex > depot_index) {
> +               WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> +                    parts.slabindex, depot_index, handle);
> +               return NULL;
> +       }
> +       slab = stack_slabs[parts.slabindex];
> +       if (!slab)
> +               return NULL;
> +
> +       stack = slab + offset;
> +       return stack;
> +}
> +
>  /**
>   * stack_depot_fetch - Fetch stack entries from a depot
>   *
> @@ -353,30 +378,42 @@ EXPORT_SYMBOL_GPL(stack_depot_print);
>  unsigned int stack_depot_fetch(depot_stack_handle_t handle,
>                                unsigned long **entries)
>  {
> -       union handle_parts parts = { .handle = handle };
> -       void *slab;
> -       size_t offset = parts.offset << STACK_ALLOC_ALIGN;
>         struct stack_record *stack;
>
>         *entries = NULL;
>         if (!handle)
>                 return 0;
>
> -       if (parts.slabindex > depot_index) {
> -               WARN(1, "slab index %d out of bounds (%d) for stack id %08x\n",
> -                       parts.slabindex, depot_index, handle);
> -               return 0;
> -       }
> -       slab = stack_slabs[parts.slabindex];
> -       if (!slab)
> +       stack = stack_depot_getstack(handle);
> +       if (!stack)
>                 return 0;
> -       stack = slab + offset;
>
>         *entries = stack->entries;
>         return stack->size;
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_fetch);
>
> +static void stack_depot_inc_count(struct stack_record *stack)
> +{
> +       refcount_inc(&stack->count);
> +}
> +
> +void stack_depot_dec_count(depot_stack_handle_t handle)
> +{
> +       struct stack_record *stack = NULL;
> +
> +       stack = stack_depot_getstack(handle);
> +       if (stack) {
> +               /*
> +                * page_owner creates some stacks via create_dummy_stack().
> +                * We are not interested in those, so make sure we only decrement
> +                * "valid" stacks.
> +                */
> +               if (refcount_read(&stack->count) > 1)

I don't see why we cannot decrement the refcounter when it is 1 -
looks like a normal situation when the number of deallocations matches
the number of allocations.
But maybe we should warn about an overflow here instead?

> +                       refcount_dec(&stack->count);
> +       }
> +}
> +
>  /**
>   * __stack_depot_save - Save a stack trace from an array
>   *
> @@ -402,7 +439,8 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
>   */
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         unsigned int nr_entries,
> -                                       gfp_t alloc_flags, bool can_alloc)
> +                                       gfp_t alloc_flags, bool can_alloc,
> +                                       enum stack_depot_action action)
>  {
>         struct stack_record *found = NULL, **bucket;
>         depot_stack_handle_t retval = 0;
> @@ -488,8 +526,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 /* Nobody used this memory, ok to free it. */
>                 free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
>         }
> -       if (found)
> +       if (found) {
>                 retval = found->handle.handle;
> +               if (action == STACK_DEPOT_ACTION_COUNT)
> +                       stack_depot_inc_count(found);
> +       }
>  fast_exit:
>         return retval;
>  }
> @@ -511,6 +552,16 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
>                                       unsigned int nr_entries,
>                                       gfp_t alloc_flags)
>  {
> -       return __stack_depot_save(entries, nr_entries, alloc_flags, true);
> +       return __stack_depot_save(entries, nr_entries, alloc_flags, true,
> +                                 STACK_DEPOT_ACTION_NONE);
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_save);
> +
> +depot_stack_handle_t stack_depot_save_action(unsigned long *entries,
> +                                            unsigned int nr_entries,
> +                                            gfp_t alloc_flags,
> +                                            enum stack_depot_action action)
> +{
> +       return __stack_depot_save(entries, nr_entries, alloc_flags, true, action);
> +}
> +EXPORT_SYMBOL_GPL(stack_depot_save_action);
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 69f583855c8b..8077c6e70815 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,8 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
>         unsigned int nr_entries;
>
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> -       return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> +       return __stack_depot_save(entries, nr_entries, flags, can_alloc,
> +                                 STACK_DEPOT_ACTION_NONE);
>  }
>
>  void kasan_set_track(struct kasan_track *track, gfp_t flags)
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index e4c6f3f1695b..8730f377fa91 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -106,7 +106,8 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
>         return (void *)page_ext + page_owner_ops.offset;
>  }
>
> -static noinline depot_stack_handle_t save_stack(gfp_t flags)
> +static noinline depot_stack_handle_t save_stack(gfp_t flags,
> +                                               enum stack_depot_action action)
>  {
>         unsigned long entries[PAGE_OWNER_STACK_DEPTH];
>         depot_stack_handle_t handle;
> @@ -125,7 +126,7 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
>         current->in_page_owner = 1;
>
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
> -       handle = stack_depot_save(entries, nr_entries, flags);
> +       handle = stack_depot_save_action(entries, nr_entries, flags, action);
>         if (!handle)
>                 handle = failure_handle;
>
> @@ -138,6 +139,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
>         int i;
>         struct page_ext *page_ext;
>         depot_stack_handle_t handle;
> +       depot_stack_handle_t alloc_handle;
>         struct page_owner *page_owner;
>         u64 free_ts_nsec = local_clock();
>
> @@ -145,7 +147,10 @@ void __reset_page_owner(struct page *page, unsigned short order)
>         if (unlikely(!page_ext))
>                 return;
>
> -       handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> +       page_owner = get_page_owner(page_ext);
> +       alloc_handle = page_owner->handle;
> +
> +       handle = save_stack(GFP_NOWAIT | __GFP_NOWARN, STACK_DEPOT_ACTION_NONE);
>         for (i = 0; i < (1 << order); i++) {
>                 __clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
>                 page_owner = get_page_owner(page_ext);
> @@ -153,6 +158,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
>                 page_owner->free_ts_nsec = free_ts_nsec;
>                 page_ext = page_ext_next(page_ext);
>         }
> +       stack_depot_dec_count(alloc_handle);
>  }
>
>  static inline void __set_page_owner_handle(struct page_ext *page_ext,
> @@ -189,7 +195,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
>         if (unlikely(!page_ext))
>                 return;
>
> -       handle = save_stack(gfp_mask);
> +       handle = save_stack(gfp_mask, STACK_DEPOT_ACTION_COUNT);
>         __set_page_owner_handle(page_ext, handle, order, gfp_mask);
>  }
>
> --
> 2.35.3
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-04 16:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04  9:33 [PATCH v3 0/3] page_owner: print stacks and their counter Oscar Salvador
2022-11-04  9:33 ` [PATCH v3 1/3] lib/stackdepot: Add a refcount field in stack_record Oscar Salvador
2022-11-04 16:14   ` Alexander Potapenko
2022-11-04  9:33 ` [PATCH v3 2/3] mm, page_owner: Add page_owner_stacks file to print out only stacks and their counte Oscar Salvador
2022-11-04  9:33 ` [PATCH v3 3/3] mm,page_owner: Filter out stacks by a threshold counter Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).