linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot
@ 2022-04-04 16:41 Vlastimil Babka
  2022-04-04 16:41 ` [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka, Jonathan Corbet, Randy Dunlap

Changes since v2:
https://lore.kernel.org/all/20220302173122.11939-1-vbabka@suse.cz/

- Reworked patch 1 based on feedback from Mike and Marco. Updated patch
  3 accordingly.
- Rebased to v5.18-rc1
- Add acks/reviews. Thanks all.

Hi,

this series combines and revives patches from Oliver's last year
bachelor thesis (where I was the advisor) that make SLUB's debugfs
files alloc_traces and free_traces more useful.
The resubmission was blocked on stackdepot changes that are now merged,
as explained in patch 3.

Patch 1 makes it possible to use stack depot without bootstrap issues.

Patch 2 is a new preparatory cleanup.

Patch 3 originally submitted here [1], was merged to mainline but
reverted for stackdepot related issues as explained in the patch.

Patches 4-6 originally submitted as RFC here [2]. In this submission I
have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
be considered too intrusive so I will postpone it for later. The docs
patch is adjusted accordingly.

Also available in git, based on v5.18-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v3r2

I plan to add this to the slab (-next) tree for 5.19. lkp has been
testing this already from my git, which resolved some more corner cases
and recently only uncovered a pre-existing kfence bug [3]

[1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
[2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
[3] https://lore.kernel.org/all/8368021e-86c3-a93f-b29d-efed02135c41@suse.cz/


Oliver Glitta (4):
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: distinguish and print stack traces in debugfs files
  mm/slub: sort debugfs output by frequency of stack traces
  slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (2):
  lib/stackdepot: allow requesting early initialization dynamically
  mm/slub: move struct track init out of set_track()

 Documentation/vm/slub.rst  |  64 ++++++++++++++++++
 include/linux/stackdepot.h |  26 +++++--
 init/Kconfig               |   1 +
 lib/Kconfig.debug          |   1 +
 lib/stackdepot.c           |  66 ++++++++++++------
 mm/page_owner.c            |   9 ++-
 mm/slab_common.c           |   5 ++
 mm/slub.c                  | 135 +++++++++++++++++++++++++------------
 8 files changed, 234 insertions(+), 73 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  2022-04-04 16:41 ` [PATCH v3 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka

In a later patch we want to add stackdepot support for object owner
tracking in slub caches, which is enabled by slub_debug boot parameter.
This creates a bootstrap problem as some caches are created early in
boot when slab_is_available() is false and thus stack_depot_init()
tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
already beyond memblock_free_all(). Ideally memblock allocation should
fail, yet it succeeds, but later the system crashes, which is a
separately handled issue.

To resolve this boostrap issue in a robust way, this patch adds another
way to request stack_depot_early_init(), which happens at a well-defined
point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
code that's e.g. processing boot parameters (which happens early enough)
can call a new function stack_depot_want_early_init(), which sets a flag
that stack_depot_early_init() will check.

In this patch we also convert page_owner to this approach. While it
doesn't have the bootstrap issue as slub, it's also a functionality
enabled by a boot param and can thus request stack_depot_early_init()
with memblock allocation instead of later initialization with
kvmalloc().

As suggested by Mike, make stack_depot_early_init() only attempt
memblock allocation and stack_depot_init() only attempt kvmalloc().
Also change the latter to kvcalloc(). In both cases we can lose the
explicit array zeroing, which the allocations do already.

As suggested by Marco, provide empty implementations of the init
functions for !CONFIG_STACKDEPOT builds to simplify the callers.

[1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/

Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
---
 include/linux/stackdepot.h | 26 ++++++++++++---
 lib/stackdepot.c           | 66 +++++++++++++++++++++++++-------------
 mm/page_owner.c            |  9 ++++--
 3 files changed, 72 insertions(+), 29 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 17f992fe6355..bc2797955de9 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					gfp_t gfp_flags, bool can_alloc);
 
 /*
- * Every user of stack depot has to call this during its own init when it's
- * decided that it will be calling stack_depot_save() later.
+ * Every user of stack depot has to call stack_depot_init() during its own init
+ * when it's decided that it will be calling stack_depot_save() later. This is
+ * recommended for e.g. modules initialized later in the boot process, when
+ * slab_is_available() is true.
  *
  * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
  * enabled as part of mm_init(), for subsystems where it's known at compile time
  * that stack depot will be used.
+ *
+ * Another alternative is to call stack_depot_want_early_init(), when the
+ * decision to use stack depot is taken e.g. when evaluating kernel boot
+ * parameters, which precedes the enablement point in mm_init().
+ *
+ * stack_depot_init() and stack_depot_want_early_init() can be called regardless
+ * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
+ * functions should only be called from code that makes sure CONFIG_STACKDEPOT
+ * is enabled.
  */
+#ifdef CONFIG_STACKDEPOT
 int stack_depot_init(void);
 
-#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
-static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
+void __init stack_depot_want_early_init(void);
+
+/* This is supposed to be called only from mm_init() */
+int __init stack_depot_early_init(void);
 #else
+static inline int stack_depot_init(void) { return 0; }
+
+static inline void stack_depot_want_early_init(void) { }
+
 static inline int stack_depot_early_init(void)	{ return 0; }
 #endif
 
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..6c4644c9ed44 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -66,6 +66,9 @@ struct stack_record {
 	unsigned long entries[];	/* Variable-sized array of entries. */
 };
 
+static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
+static bool __stack_depot_early_init_passed __initdata;
+
 static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
 
 static int depot_index;
@@ -162,38 +165,57 @@ static int __init is_stack_depot_disabled(char *str)
 }
 early_param("stack_depot_disable", is_stack_depot_disabled);
 
-/*
- * __ref because of memblock_alloc(), which will not be actually called after
- * the __init code is gone, because at that point slab_is_available() is true
- */
-__ref int stack_depot_init(void)
+void __init stack_depot_want_early_init(void)
+{
+	/* Too late to request early init now */
+	WARN_ON(__stack_depot_early_init_passed);
+
+	__stack_depot_want_early_init = true;
+}
+
+int __init stack_depot_early_init(void)
+{
+	size_t size;
+
+	/* This is supposed to be called only once, from mm_init() */
+	if (WARN_ON(__stack_depot_early_init_passed))
+		return 0;
+
+	__stack_depot_early_init_passed = true;
+
+	if (!__stack_depot_want_early_init || stack_depot_disable)
+		return 0;
+
+	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
+	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+
+	if (!stack_table) {
+		pr_err("Stack Depot hash table allocation failed, disabling\n");
+		stack_depot_disable = true;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int stack_depot_init(void)
 {
 	static DEFINE_MUTEX(stack_depot_init_mutex);
+	int ret = 0;
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
-		int i;
-
-		if (slab_is_available()) {
-			pr_info("Stack Depot allocating hash table with kvmalloc\n");
-			stack_table = kvmalloc(size, GFP_KERNEL);
-		} else {
-			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
-			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
-		}
-		if (stack_table) {
-			for (i = 0; i < STACK_HASH_SIZE;  i++)
-				stack_table[i] = NULL;
-		} else {
+		pr_info("Stack Depot allocating hash table with kvcalloc\n");
+		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
+		if (!stack_table) {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
 			stack_depot_disable = true;
-			mutex_unlock(&stack_depot_init_mutex);
-			return -ENOMEM;
+			ret = -ENOMEM;
 		}
 	}
 	mutex_unlock(&stack_depot_init_mutex);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
diff --git a/mm/page_owner.c b/mm/page_owner.c
index fb3a05fdebdb..2743062e92c2 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -45,7 +45,12 @@ static void init_early_allocated_pages(void);
 
 static int __init early_page_owner_param(char *buf)
 {
-	return kstrtobool(buf, &page_owner_enabled);
+	int ret = kstrtobool(buf, &page_owner_enabled);
+
+	if (page_owner_enabled)
+		stack_depot_want_early_init();
+
+	return ret;
 }
 early_param("page_owner", early_page_owner_param);
 
@@ -83,8 +88,6 @@ static __init void init_page_owner(void)
 	if (!page_owner_enabled)
 		return;
 
-	stack_depot_init();
-
 	register_dummy_stack();
 	register_failure_stack();
 	register_early_stack();
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/6] mm/slub: move struct track init out of set_track()
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-04-04 16:41 ` [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  2022-04-04 16:41 ` [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka

set_track() either zeroes out the struct track or fills it, depending on
the addr parameter. This is unnecessary as there's only one place that
calls it for the initialization - init_tracking(). We can simply do the
zeroing there, with a single memset() that covers both TRACK_ALLOC and
TRACK_FREE as they are adjacent.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 74d92aa4a3a2..cd4fd0159911 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
 {
 	struct track *p = get_track(s, object, alloc);
 
-	if (addr) {
 #ifdef CONFIG_STACKTRACE
-		unsigned int nr_entries;
+	unsigned int nr_entries;
 
-		metadata_access_enable();
-		nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-					      TRACK_ADDRS_COUNT, 3);
-		metadata_access_disable();
+	metadata_access_enable();
+	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
+				      TRACK_ADDRS_COUNT, 3);
+	metadata_access_disable();
 
-		if (nr_entries < TRACK_ADDRS_COUNT)
-			p->addrs[nr_entries] = 0;
+	if (nr_entries < TRACK_ADDRS_COUNT)
+		p->addrs[nr_entries] = 0;
 #endif
-		p->addr = addr;
-		p->cpu = smp_processor_id();
-		p->pid = current->pid;
-		p->when = jiffies;
-	} else {
-		memset(p, 0, sizeof(struct track));
-	}
+	p->addr = addr;
+	p->cpu = smp_processor_id();
+	p->pid = current->pid;
+	p->when = jiffies;
 }
 
 static void init_tracking(struct kmem_cache *s, void *object)
 {
+	struct track *p;
+
 	if (!(s->flags & SLAB_STORE_USER))
 		return;
 
-	set_track(s, object, TRACK_FREE, 0UL);
-	set_track(s, object, TRACK_ALLOC, 0UL);
+	p = get_track(s, object, TRACK_ALLOC);
+	memset(p, 0, 2*sizeof(struct track));
 }
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-04-04 16:41 ` [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
  2022-04-04 16:41 ` [PATCH v3 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  2022-04-04 16:41 ` [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Many stack traces are similar so there are many similar arrays.
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle.  Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the following patch using the stackdepot handle
instead of matching stacks manually.

[ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]

This was initially merged as commit 788691464c29 and reverted by commit
ae14c63a9f20 due to several issues, that should now be fixed.
The problem of unconditional memory overhead by stackdepot has been
addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
and stack_table allocation by kvmalloc()"), so the dependency on
stackdepot will result in extra memory usage only when a slab cache
tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
The build failures on some architectures were also addressed, and the
reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
patch.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 init/Kconfig      |  1 +
 lib/Kconfig.debug |  1 +
 mm/slab_common.c  |  5 ++++
 mm/slub.c         | 71 ++++++++++++++++++++++++++---------------------
 4 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index ddcbefe535e9..adc57f989d87 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1875,6 +1875,7 @@ config SLUB_DEBUG
 	default y
 	bool "Enable SLUB debugging support" if EXPERT
 	depends on SLUB && SYSFS
+	select STACKDEPOT if STACKTRACE_SUPPORT
 	help
 	  SLUB has extensive debug support features. Disabling these can
 	  result in significant savings in code size. This also disables
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 075cd25363ac..78d6139111cd 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -709,6 +709,7 @@ config DEBUG_SLAB
 config SLUB_DEBUG_ON
 	bool "SLUB debugging on by default"
 	depends on SLUB && SLUB_DEBUG
+	select STACKDEPOT_ALWAYS_INIT if STACKTRACE_SUPPORT
 	default n
 	help
 	  Boot with debugging on by default. SLUB boots by default with
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6ee64d6208b3..73943479a2b7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -24,6 +24,7 @@
 #include <asm/tlbflush.h>
 #include <asm/page.h>
 #include <linux/memcontrol.h>
+#include <linux/stackdepot.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/kmem.h>
@@ -314,9 +315,13 @@ kmem_cache_create_usercopy(const char *name,
 	 * If no slub_debug was enabled globally, the static key is not yet
 	 * enabled by setup_slub_debug(). Enable it if the cache is being
 	 * created with any of the debugging flags passed explicitly.
+	 * It's also possible that this is the first cache created with
+	 * SLAB_STORE_USER and we should init stack_depot for it.
 	 */
 	if (flags & SLAB_DEBUG_FLAGS)
 		static_branch_enable(&slub_debug_enabled);
+	if (flags & SLAB_STORE_USER)
+		stack_depot_init();
 #endif
 
 	mutex_lock(&slab_mutex);
diff --git a/mm/slub.c b/mm/slub.c
index cd4fd0159911..98c1450c23f0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -26,6 +26,7 @@
 #include <linux/cpuset.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
+#include <linux/stackdepot.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/kfence.h>
@@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
 #define TRACK_ADDRS_COUNT 16
 struct track {
 	unsigned long addr;	/* Called from address */
-#ifdef CONFIG_STACKTRACE
-	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
+#ifdef CONFIG_STACKDEPOT
+	depot_stack_handle_t handle;
 #endif
 	int cpu;		/* Was running on cpu */
 	int pid;		/* Pid context */
@@ -724,22 +725,19 @@ static struct track *get_track(struct kmem_cache *s, void *object,
 	return kasan_reset_tag(p + alloc);
 }
 
-static void set_track(struct kmem_cache *s, void *object,
+static void noinline set_track(struct kmem_cache *s, void *object,
 			enum track_item alloc, unsigned long addr)
 {
 	struct track *p = get_track(s, object, alloc);
 
-#ifdef CONFIG_STACKTRACE
+#ifdef CONFIG_STACKDEPOT
+	unsigned long entries[TRACK_ADDRS_COUNT];
 	unsigned int nr_entries;
 
-	metadata_access_enable();
-	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-				      TRACK_ADDRS_COUNT, 3);
-	metadata_access_disable();
-
-	if (nr_entries < TRACK_ADDRS_COUNT)
-		p->addrs[nr_entries] = 0;
+	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
+	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
 #endif
+
 	p->addr = addr;
 	p->cpu = smp_processor_id();
 	p->pid = current->pid;
@@ -759,20 +757,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
 {
+	depot_stack_handle_t handle __maybe_unused;
+
 	if (!t->addr)
 		return;
 
 	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
 	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
-#ifdef CONFIG_STACKTRACE
-	{
-		int i;
-		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
-			if (t->addrs[i])
-				pr_err("\t%pS\n", (void *)t->addrs[i]);
-			else
-				break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(t->handle);
+	if (handle)
+		stack_depot_print(handle);
+	else
+		pr_err("object allocation/free stack trace missing\n");
 #endif
 }
 
@@ -1532,6 +1529,8 @@ static int __init setup_slub_debug(char *str)
 			global_slub_debug_changed = true;
 		} else {
 			slab_list_specified = true;
+			if (flags & SLAB_STORE_USER)
+				stack_depot_want_early_init();
 		}
 	}
 
@@ -1549,6 +1548,8 @@ static int __init setup_slub_debug(char *str)
 	}
 out:
 	slub_debug = global_flags;
+	if (slub_debug & SLAB_STORE_USER)
+		stack_depot_want_early_init();
 	if (slub_debug != 0 || slub_debug_string)
 		static_branch_enable(&slub_debug_enabled);
 	else
@@ -4342,18 +4343,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 	objp = fixup_red_left(s, objp);
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	kpp->kp_ret = (void *)trackp->addr;
-#ifdef CONFIG_STACKTRACE
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_stack[i])
-			break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	{
+		depot_stack_handle_t handle;
+		unsigned long *entries;
+		unsigned int nr_entries;
+
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_stack[i] = (void *)entries[i];
+		}
 
-	trackp = get_track(s, objp, TRACK_FREE);
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_free_stack[i])
-			break;
+		trackp = get_track(s, objp, TRACK_FREE);
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_free_stack[i] = (void *)entries[i];
+		}
 	}
 #endif
 #endif
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (2 preceding siblings ...)
  2022-04-04 16:41 ` [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  2022-04-04 16:41 ` [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
  2022-04-04 16:41 ` [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Aggregate objects in slub cache by unique stack trace in addition to
caller address when producing contents of debugfs files alloc_traces and
free_traces in debugfs. Also add the stack traces to the debugfs output.
This makes it much more useful to e.g. debug memory leaks.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 98c1450c23f0..f2e550e1adf0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5064,6 +5064,7 @@ EXPORT_SYMBOL(validate_slab_cache);
  */
 
 struct location {
+	depot_stack_handle_t handle;
 	unsigned long count;
 	unsigned long addr;
 	long long sum_time;
@@ -5116,9 +5117,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 {
 	long start, end, pos;
 	struct location *l;
-	unsigned long caddr;
+	unsigned long caddr, chandle;
 	unsigned long age = jiffies - track->when;
+	depot_stack_handle_t handle = 0;
 
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(track->handle);
+#endif
 	start = -1;
 	end = t->count;
 
@@ -5133,7 +5138,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 			break;
 
 		caddr = t->loc[pos].addr;
-		if (track->addr == caddr) {
+		chandle = t->loc[pos].handle;
+		if ((track->addr == caddr) && (handle == chandle)) {
 
 			l = &t->loc[pos];
 			l->count++;
@@ -5158,6 +5164,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 
 		if (track->addr < caddr)
 			end = pos;
+		else if (track->addr == caddr && handle < chandle)
+			end = pos;
 		else
 			start = pos;
 	}
@@ -5180,6 +5188,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 	l->max_time = age;
 	l->min_pid = track->pid;
 	l->max_pid = track->pid;
+	l->handle = handle;
 	cpumask_clear(to_cpumask(l->cpus));
 	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
 	nodes_clear(l->nodes);
@@ -6089,6 +6098,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
 			seq_printf(seq, " nodes=%*pbl",
 				 nodemask_pr_args(&l->nodes));
 
+#ifdef CONFIG_STACKDEPOT
+		{
+			depot_stack_handle_t handle;
+			unsigned long *entries;
+			unsigned int nr_entries, j;
+
+			handle = READ_ONCE(l->handle);
+			if (handle) {
+				nr_entries = stack_depot_fetch(handle, &entries);
+				seq_puts(seq, "\n");
+				for (j = 0; j < nr_entries; j++)
+					seq_printf(seq, "        %pS\n", (void *)entries[j]);
+			}
+		}
+#endif
 		seq_puts(seq, "\n");
 	}
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (3 preceding siblings ...)
  2022-04-04 16:41 ` [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  2022-04-04 16:41 ` [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Sort the output of debugfs alloc_traces and free_traces by the frequency
of allocation/freeing stack traces. Most frequently used stack traces
will be printed first, e.g. for easier memory leak debugging.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index f2e550e1adf0..2963dc123336 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -38,6 +38,7 @@
 #include <linux/memcontrol.h>
 #include <linux/random.h>
 #include <kunit/test.h>
+#include <linux/sort.h>
 
 #include <linux/debugfs.h>
 #include <trace/events/kmem.h>
@@ -6137,6 +6138,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
 	return NULL;
 }
 
+static int cmp_loc_by_count(const void *a, const void *b, const void *data)
+{
+	struct location *loc1 = (struct location *)a;
+	struct location *loc2 = (struct location *)b;
+
+	if (loc1->count > loc2->count)
+		return -1;
+	else
+		return 1;
+}
+
 static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
 {
 	struct loc_track *t = seq->private;
@@ -6198,6 +6210,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
+	/* Sort locations by count */
+	sort_r(t->loc, t->count, sizeof(struct location),
+		cmp_loc_by_count, NULL, NULL);
+
 	bitmap_free(obj_map);
 	return 0;
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches
  2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (4 preceding siblings ...)
  2022-04-04 16:41 ` [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
@ 2022-04-04 16:41 ` Vlastimil Babka
  2022-04-05 21:40   ` David Rientjes
  5 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-04 16:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Vlastimil Babka, Jonathan Corbet, Randy Dunlap, linux-doc

From: Oliver Glitta <glittao@gmail.com>

Add description of debugfs files alloc_traces and free_traces
to SLUB cache documentation.

[ vbabka@suse.cz: some rewording ]

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
---
 Documentation/vm/slub.rst | 64 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index d3028554b1e9..43063ade737a 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -384,5 +384,69 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
       40,60`` range will plot only samples collected between 40th and
       60th seconds).
 
+
+DebugFS files for SLUB
+======================
+
+For more information about current state of SLUB caches with the user tracking
+debug option enabled, debugfs files are available, typically under
+/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
+tracking). There are 2 types of these files with the following debug
+information:
+
+1. alloc_traces::
+
+    Prints information about unique allocation traces of the currently
+    allocated objects. The output is sorted by frequency of each trace.
+
+    Information in the output:
+    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
+    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
+
+    Example:::
+
+    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
+	__slab_alloc+0x6d/0x90
+	kmem_cache_alloc_trace+0x2eb/0x300
+	populate_error_injection_list+0x97/0x110
+	init_error_injection+0x1b/0x71
+	do_one_initcall+0x5f/0x2d0
+	kernel_init_freeable+0x26f/0x2d7
+	kernel_init+0xe/0x118
+	ret_from_fork+0x22/0x30
+
+
+2. free_traces::
+
+    Prints information about unique freeing traces of the currently allocated
+    objects. The freeing traces thus come from the previous life-cycle of the
+    objects and are reported as not available for objects allocated for the first
+    time. The output is sorted by frequency of each trace.
+
+    Information in the output:
+    Number of objects, freeing function, minimal/average/maximal jiffies since free,
+    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
+
+    Example:::
+
+    1980 <not-available> age=4294912290 pid=0 cpus=0
+    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
+	kfree+0x2db/0x420
+	acpi_ut_update_ref_count+0x6a6/0x782
+	acpi_ut_update_object_reference+0x1ad/0x234
+	acpi_ut_remove_reference+0x7d/0x84
+	acpi_rs_get_prt_method_data+0x97/0xd6
+	acpi_get_irq_routing_table+0x82/0xc4
+	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
+	acpi_pci_irq_lookup+0x3a/0x1e0
+	acpi_pci_irq_enable+0x77/0x240
+	pcibios_enable_device+0x39/0x40
+	do_pci_enable_device.part.0+0x5d/0xe0
+	pci_enable_device_flags+0xfc/0x120
+	pci_enable_device+0x13/0x20
+	virtio_pci_probe+0x9e/0x170
+	local_pci_probe+0x48/0x80
+	pci_device_probe+0x105/0x1c0
+
 Christoph Lameter, May 30, 2007
 Sergey Senozhatsky, October 23, 2015
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-04-04 16:41 ` [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  2022-04-06  8:55     ` Vlastimil Babka
  2022-04-06 12:21     ` Mike Rapoport
  0 siblings, 2 replies; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
> 
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parameters (which happens early enough)
> can call a new function stack_depot_want_early_init(), which sets a flag
> that stack_depot_early_init() will check.
> 
> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
> 
> As suggested by Mike, make stack_depot_early_init() only attempt
> memblock allocation and stack_depot_init() only attempt kvmalloc().
> Also change the latter to kvcalloc(). In both cases we can lose the
> explicit array zeroing, which the allocations do already.
> 
> As suggested by Marco, provide empty implementations of the init
> functions for !CONFIG_STACKDEPOT builds to simplify the callers.
> 
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> 
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Marco Elver <elver@google.com>
> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  include/linux/stackdepot.h | 26 ++++++++++++---
>  lib/stackdepot.c           | 66 +++++++++++++++++++++++++-------------
>  mm/page_owner.c            |  9 ++++--
>  3 files changed, 72 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..bc2797955de9 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					gfp_t gfp_flags, bool can_alloc);
>  
>  /*
> - * Every user of stack depot has to call this during its own init when it's
> - * decided that it will be calling stack_depot_save() later.
> + * Every user of stack depot has to call stack_depot_init() during its own init
> + * when it's decided that it will be calling stack_depot_save() later. This is
> + * recommended for e.g. modules initialized later in the boot process, when
> + * slab_is_available() is true.
>   *
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to call stack_depot_want_early_init(), when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the enablement point in mm_init().
> + *
> + * stack_depot_init() and stack_depot_want_early_init() can be called regardless
> + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
> + * functions should only be called from code that makes sure CONFIG_STACKDEPOT
> + * is enabled.
>   */
> +#ifdef CONFIG_STACKDEPOT
>  int stack_depot_init(void);
>  
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> +void __init stack_depot_want_early_init(void);
> +
> +/* This is supposed to be called only from mm_init() */
> +int __init stack_depot_early_init(void);
>  #else
> +static inline int stack_depot_init(void) { return 0; }
> +
> +static inline void stack_depot_want_early_init(void) { }
> +
>  static inline int stack_depot_early_init(void)	{ return 0; }
>  #endif
>  
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..6c4644c9ed44 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,9 @@ struct stack_record {
>  	unsigned long entries[];	/* Variable-sized array of entries. */
>  };
>  
> +static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
> +static bool __stack_depot_early_init_passed __initdata;
> +
>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>  
>  static int depot_index;
> @@ -162,38 +165,57 @@ static int __init is_stack_depot_disabled(char *str)
>  }
>  early_param("stack_depot_disable", is_stack_depot_disabled);
>  
> -/*
> - * __ref because of memblock_alloc(), which will not be actually called after
> - * the __init code is gone, because at that point slab_is_available() is true
> - */
> -__ref int stack_depot_init(void)
> +void __init stack_depot_want_early_init(void)
> +{
> +	/* Too late to request early init now */
> +	WARN_ON(__stack_depot_early_init_passed);
> +
> +	__stack_depot_want_early_init = true;
> +}
> +
> +int __init stack_depot_early_init(void)
> +{
> +	size_t size;
> +
> +	/* This is supposed to be called only once, from mm_init() */
> +	if (WARN_ON(__stack_depot_early_init_passed))
> +		return 0;
> +
> +	__stack_depot_early_init_passed = true;
> +
> +	if (!__stack_depot_want_early_init || stack_depot_disable)
> +		return 0;
> +
> +	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
> +	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));

I think the kvcalloc() in the main init path is very unlikely to fail, but 
perhaps this memblock_alloc() might?  If so, a nit might be to include 
this size as part of the printk.

Either way:

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/6] mm/slub: move struct track init out of set_track()
  2022-04-04 16:41 ` [PATCH v3 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  0 siblings, 0 replies; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> set_track() either zeroes out the struct track or fills it, depending on
> the addr parameter. This is unnecessary as there's only one place that
> calls it for the initialization - init_tracking(). We can simply do the
> zeroing there, with a single memset() that covers both TRACK_ALLOC and
> TRACK_FREE as they are adjacent.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-04-04 16:41 ` [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  2022-04-06  9:03     ` Vlastimil Babka
  0 siblings, 1 reply; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> From: Oliver Glitta <glittao@gmail.com>
> 
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
> 
> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
> stackdepot to save stack trace.
> 
> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
> 
> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
> 
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  init/Kconfig      |  1 +
>  lib/Kconfig.debug |  1 +
>  mm/slab_common.c  |  5 ++++
>  mm/slub.c         | 71 ++++++++++++++++++++++++++---------------------
>  4 files changed, 47 insertions(+), 31 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index ddcbefe535e9..adc57f989d87 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1875,6 +1875,7 @@ config SLUB_DEBUG
>  	default y
>  	bool "Enable SLUB debugging support" if EXPERT
>  	depends on SLUB && SYSFS
> +	select STACKDEPOT if STACKTRACE_SUPPORT
>  	help
>  	  SLUB has extensive debug support features. Disabling these can
>  	  result in significant savings in code size. This also disables
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 075cd25363ac..78d6139111cd 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -709,6 +709,7 @@ config DEBUG_SLAB
>  config SLUB_DEBUG_ON
>  	bool "SLUB debugging on by default"
>  	depends on SLUB && SLUB_DEBUG
> +	select STACKDEPOT_ALWAYS_INIT if STACKTRACE_SUPPORT
>  	default n
>  	help
>  	  Boot with debugging on by default. SLUB boots by default with
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6ee64d6208b3..73943479a2b7 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -24,6 +24,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/page.h>
>  #include <linux/memcontrol.h>
> +#include <linux/stackdepot.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/kmem.h>
> @@ -314,9 +315,13 @@ kmem_cache_create_usercopy(const char *name,
>  	 * If no slub_debug was enabled globally, the static key is not yet
>  	 * enabled by setup_slub_debug(). Enable it if the cache is being
>  	 * created with any of the debugging flags passed explicitly.
> +	 * It's also possible that this is the first cache created with
> +	 * SLAB_STORE_USER and we should init stack_depot for it.
>  	 */
>  	if (flags & SLAB_DEBUG_FLAGS)
>  		static_branch_enable(&slub_debug_enabled);
> +	if (flags & SLAB_STORE_USER)
> +		stack_depot_init();
>  #endif
>  
>  	mutex_lock(&slab_mutex);
> diff --git a/mm/slub.c b/mm/slub.c
> index cd4fd0159911..98c1450c23f0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -26,6 +26,7 @@
>  #include <linux/cpuset.h>
>  #include <linux/mempolicy.h>
>  #include <linux/ctype.h>
> +#include <linux/stackdepot.h>
>  #include <linux/debugobjects.h>
>  #include <linux/kallsyms.h>
>  #include <linux/kfence.h>
> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>  #define TRACK_ADDRS_COUNT 16
>  struct track {
>  	unsigned long addr;	/* Called from address */
> -#ifdef CONFIG_STACKTRACE
> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> +	depot_stack_handle_t handle;
>  #endif
>  	int cpu;		/* Was running on cpu */
>  	int pid;		/* Pid context */
> @@ -724,22 +725,19 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>  	return kasan_reset_tag(p + alloc);
>  }
>  
> -static void set_track(struct kmem_cache *s, void *object,
> +static void noinline set_track(struct kmem_cache *s, void *object,
>  			enum track_item alloc, unsigned long addr)
>  {
>  	struct track *p = get_track(s, object, alloc);
>  
> -#ifdef CONFIG_STACKTRACE
> +#ifdef CONFIG_STACKDEPOT
> +	unsigned long entries[TRACK_ADDRS_COUNT];
>  	unsigned int nr_entries;
>  
> -	metadata_access_enable();
> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> -				      TRACK_ADDRS_COUNT, 3);
> -	metadata_access_disable();
> -
> -	if (nr_entries < TRACK_ADDRS_COUNT)
> -		p->addrs[nr_entries] = 0;
> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
> +	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);

I think this should also have __GFP_NOWARN set since this allocation could 
easily fail and it would unnecessarily spam the kernel log unless we 
actually care about the stack trace being printed later (and the patch 
already indicates the allocation failed in print_track() when it matters).

Otherwise:

Acked-by: David Rientjes <rientjes@google.com>

>  #endif
> +
>  	p->addr = addr;
>  	p->cpu = smp_processor_id();
>  	p->pid = current->pid;
> @@ -759,20 +757,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>  
>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
>  {
> +	depot_stack_handle_t handle __maybe_unused;
> +
>  	if (!t->addr)
>  		return;
>  
>  	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>  	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
> -#ifdef CONFIG_STACKTRACE
> -	{
> -		int i;
> -		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
> -			if (t->addrs[i])
> -				pr_err("\t%pS\n", (void *)t->addrs[i]);
> -			else
> -				break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(t->handle);
> +	if (handle)
> +		stack_depot_print(handle);
> +	else
> +		pr_err("object allocation/free stack trace missing\n");
>  #endif
>  }
>  
> @@ -1532,6 +1529,8 @@ static int __init setup_slub_debug(char *str)
>  			global_slub_debug_changed = true;
>  		} else {
>  			slab_list_specified = true;
> +			if (flags & SLAB_STORE_USER)
> +				stack_depot_want_early_init();
>  		}
>  	}
>  
> @@ -1549,6 +1548,8 @@ static int __init setup_slub_debug(char *str)
>  	}
>  out:
>  	slub_debug = global_flags;
> +	if (slub_debug & SLAB_STORE_USER)
> +		stack_depot_want_early_init();
>  	if (slub_debug != 0 || slub_debug_string)
>  		static_branch_enable(&slub_debug_enabled);
>  	else
> @@ -4342,18 +4343,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  	objp = fixup_red_left(s, objp);
>  	trackp = get_track(s, objp, TRACK_ALLOC);
>  	kpp->kp_ret = (void *)trackp->addr;
> -#ifdef CONFIG_STACKTRACE
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_stack[i])
> -			break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	{
> +		depot_stack_handle_t handle;
> +		unsigned long *entries;
> +		unsigned int nr_entries;
> +
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_stack[i] = (void *)entries[i];
> +		}
>  
> -	trackp = get_track(s, objp, TRACK_FREE);
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_free_stack[i])
> -			break;
> +		trackp = get_track(s, objp, TRACK_FREE);
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_free_stack[i] = (void *)entries[i];
> +		}
>  	}
>  #endif
>  #endif
> -- 
> 2.35.1
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files
  2022-04-04 16:41 ` [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  2022-04-06  9:09     ` Vlastimil Babka
  0 siblings, 1 reply; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> From: Oliver Glitta <glittao@gmail.com>
> 
> Aggregate objects in slub cache by unique stack trace in addition to
> caller address when producing contents of debugfs files alloc_traces and
> free_traces in debugfs. Also add the stack traces to the debugfs output.
> This makes it much more useful to e.g. debug memory leaks.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
>  mm/slub.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 98c1450c23f0..f2e550e1adf0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5064,6 +5064,7 @@ EXPORT_SYMBOL(validate_slab_cache);
>   */
>  
>  struct location {
> +	depot_stack_handle_t handle;
>  	unsigned long count;
>  	unsigned long addr;
>  	long long sum_time;
> @@ -5116,9 +5117,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  {
>  	long start, end, pos;
>  	struct location *l;
> -	unsigned long caddr;
> +	unsigned long caddr, chandle;
>  	unsigned long age = jiffies - track->when;
> +	depot_stack_handle_t handle = 0;
>  
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(track->handle);
> +#endif
>  	start = -1;
>  	end = t->count;
>  
> @@ -5133,7 +5138,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  			break;
>  
>  		caddr = t->loc[pos].addr;
> -		if (track->addr == caddr) {
> +		chandle = t->loc[pos].handle;
> +		if ((track->addr == caddr) && (handle == chandle)) {
>  
>  			l = &t->loc[pos];
>  			l->count++;
> @@ -5158,6 +5164,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  
>  		if (track->addr < caddr)
>  			end = pos;
> +		else if (track->addr == caddr && handle < chandle)
> +			end = pos;
>  		else
>  			start = pos;
>  	}

Does this need to properly handle the case where handle == NULL?

> @@ -5180,6 +5188,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  	l->max_time = age;
>  	l->min_pid = track->pid;
>  	l->max_pid = track->pid;
> +	l->handle = handle;
>  	cpumask_clear(to_cpumask(l->cpus));
>  	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
>  	nodes_clear(l->nodes);
> @@ -6089,6 +6098,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
>  			seq_printf(seq, " nodes=%*pbl",
>  				 nodemask_pr_args(&l->nodes));
>  
> +#ifdef CONFIG_STACKDEPOT
> +		{
> +			depot_stack_handle_t handle;
> +			unsigned long *entries;
> +			unsigned int nr_entries, j;
> +
> +			handle = READ_ONCE(l->handle);
> +			if (handle) {
> +				nr_entries = stack_depot_fetch(handle, &entries);
> +				seq_puts(seq, "\n");
> +				for (j = 0; j < nr_entries; j++)
> +					seq_printf(seq, "        %pS\n", (void *)entries[j]);
> +			}
> +		}
> +#endif
>  		seq_puts(seq, "\n");
>  	}
>  
> -- 
> 2.35.1
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces
  2022-04-04 16:41 ` [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  0 siblings, 0 replies; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> From: Oliver Glitta <glittao@gmail.com>
> 
> Sort the output of debugfs alloc_traces and free_traces by the frequency
> of allocation/freeing stack traces. Most frequently used stack traces
> will be printed first, e.g. for easier memory leak debugging.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches
  2022-04-04 16:41 ` [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
@ 2022-04-05 21:40   ` David Rientjes
  0 siblings, 0 replies; 17+ messages in thread
From: David Rientjes @ 2022-04-05 21:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan,
	Jonathan Corbet, Randy Dunlap, linux-doc

On Mon, 4 Apr 2022, Vlastimil Babka wrote:

> From: Oliver Glitta <glittao@gmail.com>
> 
> Add description of debugfs files alloc_traces and free_traces
> to SLUB cache documentation.
> 
> [ vbabka@suse.cz: some rewording ]
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-doc@vger.kernel.org
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Acked-by: Mike Rapoport <rppt@linux.ibm.com>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-04-05 21:40   ` David Rientjes
@ 2022-04-06  8:55     ` Vlastimil Babka
  2022-04-06 12:21     ` Mike Rapoport
  1 sibling, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-06  8:55 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On 4/5/22 23:40, David Rientjes wrote:
> On Mon, 4 Apr 2022, Vlastimil Babka wrote:
>> +int __init stack_depot_early_init(void)
>> +{
>> +	size_t size;
>> +
>> +	/* This is supposed to be called only once, from mm_init() */
>> +	if (WARN_ON(__stack_depot_early_init_passed))
>> +		return 0;
>> +
>> +	__stack_depot_early_init_passed = true;
>> +
>> +	if (!__stack_depot_want_early_init || stack_depot_disable)
>> +		return 0;
>> +
>> +	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
>> +	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> 
> I think the kvcalloc() in the main init path is very unlikely to fail, but 
> perhaps this memblock_alloc() might?  If so, a nit might be to include 
> this size as part of the printk.

OK, added the hunk at the end of mail. Example:

[0.062264] Stack Depot early init allocating hash table with memblock_alloc, 8388608 bytes

> Either way:
> 
> Acked-by: David Rientjes <rientjes@google.com>

Thanks!

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 6c4644c9ed44..5ca0d086ef4a 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -186,8 +186,9 @@ int __init stack_depot_early_init(void)
        if (!__stack_depot_want_early_init || stack_depot_disable)
                return 0;
 
-       pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
        size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+       pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n",
+               size);
        stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
 
        if (!stack_table) {


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-04-05 21:40   ` David Rientjes
@ 2022-04-06  9:03     ` Vlastimil Babka
  0 siblings, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-06  9:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On 4/5/22 23:40, David Rientjes wrote:
>> -static void set_track(struct kmem_cache *s, void *object,
>> +static void noinline set_track(struct kmem_cache *s, void *object,
>>  			enum track_item alloc, unsigned long addr)
>>  {
>>  	struct track *p = get_track(s, object, alloc);
>>  
>> -#ifdef CONFIG_STACKTRACE
>> +#ifdef CONFIG_STACKDEPOT
>> +	unsigned long entries[TRACK_ADDRS_COUNT];
>>  	unsigned int nr_entries;
>>  
>> -	metadata_access_enable();
>> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
>> -				      TRACK_ADDRS_COUNT, 3);
>> -	metadata_access_disable();
>> -
>> -	if (nr_entries < TRACK_ADDRS_COUNT)
>> -		p->addrs[nr_entries] = 0;
>> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
>> +	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
> 
> I think this should also have __GFP_NOWARN set since this allocation could 
> easily fail and it would unnecessarily spam the kernel log unless we 
> actually care about the stack trace being printed later (and the patch 
> already indicates the allocation failed in print_track() when it matters).

Good point. But turns out __stack_depot_save() adds it for us already.

> Otherwise:
> 
> Acked-by: David Rientjes <rientjes@google.com>

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files
  2022-04-05 21:40   ` David Rientjes
@ 2022-04-06  9:09     ` Vlastimil Babka
  0 siblings, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2022-04-06  9:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Marco Elver, Mike Rapoport, Hyeonggon Yoo, Imran Khan

On 4/5/22 23:40, David Rientjes wrote:
>> @@ -5116,9 +5117,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>>  {
>>  	long start, end, pos;
>>  	struct location *l;
>> -	unsigned long caddr;
>> +	unsigned long caddr, chandle;
>>  	unsigned long age = jiffies - track->when;
>> +	depot_stack_handle_t handle = 0;
>>  
>> +#ifdef CONFIG_STACKDEPOT
>> +	handle = READ_ONCE(track->handle);
>> +#endif
>>  	start = -1;
>>  	end = t->count;
>>  
>> @@ -5133,7 +5138,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>>  			break;
>>  
>>  		caddr = t->loc[pos].addr;
>> -		if (track->addr == caddr) {
>> +		chandle = t->loc[pos].handle;
>> +		if ((track->addr == caddr) && (handle == chandle)) {
>>  
>>  			l = &t->loc[pos];
>>  			l->count++;
>> @@ -5158,6 +5164,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>>  
>>  		if (track->addr < caddr)
>>  			end = pos;
>> +		else if (track->addr == caddr && handle < chandle)
>> +			end = pos;
>>  		else
>>  			start = pos;
>>  	}
> 
> Does this need to properly handle the case where handle == NULL?

Hm I can't think of how much more properly is possible. If objects have same
track->addr (which is the immediate caller) and also same NULL handle, they
will be counted together. I think it's the best we can do?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-04-05 21:40   ` David Rientjes
  2022-04-06  8:55     ` Vlastimil Babka
@ 2022-04-06 12:21     ` Mike Rapoport
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Rapoport @ 2022-04-06 12:21 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Marco Elver, Hyeonggon Yoo, Imran Khan

On Tue, Apr 05, 2022 at 02:40:03PM -0700, David Rientjes wrote:
> On Mon, 4 Apr 2022, Vlastimil Babka wrote:
> 
> > In a later patch we want to add stackdepot support for object owner
> > tracking in slub caches, which is enabled by slub_debug boot parameter.
> > This creates a bootstrap problem as some caches are created early in
> > boot when slab_is_available() is false and thus stack_depot_init()
> > tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> > already beyond memblock_free_all(). Ideally memblock allocation should
> > fail, yet it succeeds, but later the system crashes, which is a
> > separately handled issue.
> > 
> > To resolve this boostrap issue in a robust way, this patch adds another
> > way to request stack_depot_early_init(), which happens at a well-defined
> > point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> > code that's e.g. processing boot parameters (which happens early enough)
> > can call a new function stack_depot_want_early_init(), which sets a flag
> > that stack_depot_early_init() will check.
> > 
> > In this patch we also convert page_owner to this approach. While it
> > doesn't have the bootstrap issue as slub, it's also a functionality
> > enabled by a boot param and can thus request stack_depot_early_init()
> > with memblock allocation instead of later initialization with
> > kvmalloc().
> > 
> > As suggested by Mike, make stack_depot_early_init() only attempt
> > memblock allocation and stack_depot_init() only attempt kvmalloc().
> > Also change the latter to kvcalloc(). In both cases we can lose the
> > explicit array zeroing, which the allocations do already.
> > 
> > As suggested by Marco, provide empty implementations of the init
> > functions for !CONFIG_STACKDEPOT builds to simplify the callers.
> > 
> > [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> > 
> > Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
> > Suggested-by: Marco Elver <elver@google.com>
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > Reviewed-by: Marco Elver <elver@google.com>
> > Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  include/linux/stackdepot.h | 26 ++++++++++++---
> >  lib/stackdepot.c           | 66 +++++++++++++++++++++++++-------------
> >  mm/page_owner.c            |  9 ++++--
> >  3 files changed, 72 insertions(+), 29 deletions(-)
> > 
> > diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> > index 17f992fe6355..bc2797955de9 100644
> > --- a/include/linux/stackdepot.h
> > +++ b/include/linux/stackdepot.h
> > @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> >  					gfp_t gfp_flags, bool can_alloc);
> >  
> >  /*
> > - * Every user of stack depot has to call this during its own init when it's
> > - * decided that it will be calling stack_depot_save() later.
> > + * Every user of stack depot has to call stack_depot_init() during its own init
> > + * when it's decided that it will be calling stack_depot_save() later. This is
> > + * recommended for e.g. modules initialized later in the boot process, when
> > + * slab_is_available() is true.
> >   *
> >   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
> >   * enabled as part of mm_init(), for subsystems where it's known at compile time
> >   * that stack depot will be used.
> > + *
> > + * Another alternative is to call stack_depot_want_early_init(), when the
> > + * decision to use stack depot is taken e.g. when evaluating kernel boot
> > + * parameters, which precedes the enablement point in mm_init().
> > + *
> > + * stack_depot_init() and stack_depot_want_early_init() can be called regardless
> > + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
> > + * functions should only be called from code that makes sure CONFIG_STACKDEPOT
> > + * is enabled.
> >   */
> > +#ifdef CONFIG_STACKDEPOT
> >  int stack_depot_init(void);
> >  
> > -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> > -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> > +void __init stack_depot_want_early_init(void);
> > +
> > +/* This is supposed to be called only from mm_init() */
> > +int __init stack_depot_early_init(void);
> >  #else
> > +static inline int stack_depot_init(void) { return 0; }
> > +
> > +static inline void stack_depot_want_early_init(void) { }
> > +
> >  static inline int stack_depot_early_init(void)	{ return 0; }
> >  #endif
> >  
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index bf5ba9af0500..6c4644c9ed44 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -66,6 +66,9 @@ struct stack_record {
> >  	unsigned long entries[];	/* Variable-sized array of entries. */
> >  };
> >  
> > +static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
> > +static bool __stack_depot_early_init_passed __initdata;
> > +
> >  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
> >  
> >  static int depot_index;
> > @@ -162,38 +165,57 @@ static int __init is_stack_depot_disabled(char *str)
> >  }
> >  early_param("stack_depot_disable", is_stack_depot_disabled);
> >  
> > -/*
> > - * __ref because of memblock_alloc(), which will not be actually called after
> > - * the __init code is gone, because at that point slab_is_available() is true
> > - */
> > -__ref int stack_depot_init(void)
> > +void __init stack_depot_want_early_init(void)
> > +{
> > +	/* Too late to request early init now */
> > +	WARN_ON(__stack_depot_early_init_passed);
> > +
> > +	__stack_depot_want_early_init = true;
> > +}
> > +
> > +int __init stack_depot_early_init(void)
> > +{
> > +	size_t size;
> > +
> > +	/* This is supposed to be called only once, from mm_init() */
> > +	if (WARN_ON(__stack_depot_early_init_passed))
> > +		return 0;
> > +
> > +	__stack_depot_early_init_passed = true;
> > +
> > +	if (!__stack_depot_want_early_init || stack_depot_disable)
> > +		return 0;
> > +
> > +	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
> > +	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> 
> I think the kvcalloc() in the main init path is very unlikely to fail, but 
> perhaps this memblock_alloc() might?  If so, a nit might be to include 
> this size as part of the printk.

memblock_alloc() is even more unlikely to fail than kvcalloc() ;-)
But printing the size won't hurt.

> Either way:
> 
> Acked-by: David Rientjes <rientjes@google.com>

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-04-06 12:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 16:41 [PATCH v3 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
2022-04-04 16:41 ` [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
2022-04-05 21:40   ` David Rientjes
2022-04-06  8:55     ` Vlastimil Babka
2022-04-06 12:21     ` Mike Rapoport
2022-04-04 16:41 ` [PATCH v3 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
2022-04-05 21:40   ` David Rientjes
2022-04-04 16:41 ` [PATCH v3 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
2022-04-05 21:40   ` David Rientjes
2022-04-06  9:03     ` Vlastimil Babka
2022-04-04 16:41 ` [PATCH v3 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
2022-04-05 21:40   ` David Rientjes
2022-04-06  9:09     ` Vlastimil Babka
2022-04-04 16:41 ` [PATCH v3 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
2022-04-05 21:40   ` David Rientjes
2022-04-04 16:41 ` [PATCH v3 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
2022-04-05 21:40   ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).