linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot
@ 2022-03-02 17:31 Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka, Jonathan Corbet, Randy Dunlap

Changes since v1:
https://lore.kernel.org/all/20220225180318.20594-1-vbabka@suse.cz/

- New Patch 1 to resolve bootstrap issues between stack depot and early
  cache creation reported by Hyeonggon Yoo
- Patch 3 use GFP_NOWAIT always in set_track to avoid sleeping in
  invalid context, reported by Hyeonggon Yoo
- Addressed minor feedback and added review/tests by Hyeonggon Yoo.

Hi,

this series combines and revives patches from Oliver's last year
bachelor thesis (where I was the advisor) that make SLUB's debugfs
files alloc_traces and free_traces more useful.
The resubmission was blocked on stackdepot changes that are now merged,
as explained in patch 3.

Patch 1 makes it possible to use stack depot without bootstrap issues.

Patch 2 is a new preparatory cleanup.

Patch 3 originally submitted here [1], was merged to mainline but
reverted for stackdepot related issues as explained in the patch.

Patches 4-6 originally submitted as RFC here [2]. In this submission I
have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
be considered too intrusive so I will postpone it for later. The docs
patch is adjusted accordingly.

Also available in git, based on v5.17-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v2

I'd like to ask for some more review/testing before I add this to the
slab tree.

[1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
[2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/

Oliver Glitta (4):
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: distinguish and print stack traces in debugfs files
  mm/slub: sort debugfs output by frequency of stack traces
  slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (2):
  lib/stackdepot: allow requesting early initialization dynamically
  mm/slub: move struct track init out of set_track()

 Documentation/vm/slub.rst  |  64 ++++++++++++++++++
 include/linux/stackdepot.h |  16 ++++-
 init/Kconfig               |   1 +
 lib/stackdepot.c           |   2 +
 mm/page_owner.c            |   9 ++-
 mm/slab_common.c           |   5 ++
 mm/slub.c                  | 135 +++++++++++++++++++++++++------------
 7 files changed, 183 insertions(+), 49 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-02 17:47   ` Marco Elver
                     ` (2 more replies)
  2022-03-02 17:31 ` [PATCH v2 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka

In a later patch we want to add stackdepot support for object owner
tracking in slub caches, which is enabled by slub_debug boot parameter.
This creates a bootstrap problem as some caches are created early in
boot when slab_is_available() is false and thus stack_depot_init()
tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
already beyond memblock_free_all(). Ideally memblock allocation should
fail, yet it succeeds, but later the system crashes, which is a
separately handled issue.

To resolve this boostrap issue in a robust way, this patch adds another
way to request stack_depot_early_init(), which happens at a well-defined
point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
code that's e.g. processing boot parmeters (which happens early enough)
can set a new variable stack_depot_want_early_init as true.

In this patch we also convert page_owner to this approach. While it
doesn't have the bootstrap issue as slub, it's also a functionality
enabled by a boot param and can thus request stack_depot_early_init()
with memblock allocation instead of later initialization with
kvmalloc().

[1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/

Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/stackdepot.h | 16 ++++++++++++++--
 lib/stackdepot.c           |  2 ++
 mm/page_owner.c            |  9 ++++++---
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 17f992fe6355..1217ba2b636e 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -15,6 +15,8 @@
 
 typedef u32 depot_stack_handle_t;
 
+extern bool stack_depot_want_early_init;
+
 depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					unsigned int nr_entries,
 					gfp_t gfp_flags, bool can_alloc);
@@ -26,11 +28,21 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
  * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
  * enabled as part of mm_init(), for subsystems where it's known at compile time
  * that stack depot will be used.
+ *
+ * Another alternative is to set stack_depot_want_early_init as true, when the
+ * decision to use stack depot is taken e.g. when evaluating kernel boot
+ * parameters, which precedes the call to stack_depot_want_early_init().
  */
 int stack_depot_init(void);
 
-#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
-static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
+#ifdef CONFIG_STACKDEPOT
+static inline int stack_depot_early_init(void)
+{
+	if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT)
+	    || stack_depot_want_early_init)
+		return stack_depot_init();
+	return 0;
+}
 #else
 static inline int stack_depot_early_init(void)	{ return 0; }
 #endif
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..02e2b5fcbf3b 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -66,6 +66,8 @@ struct stack_record {
 	unsigned long entries[];	/* Variable-sized array of entries. */
 };
 
+bool stack_depot_want_early_init = false;
+
 static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
 
 static int depot_index;
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 99e360df9465..40dce2b81d13 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
 
 static int __init early_page_owner_param(char *buf)
 {
-	return kstrtobool(buf, &page_owner_enabled);
+	int ret = kstrtobool(buf, &page_owner_enabled);
+
+	if (page_owner_enabled)
+		stack_depot_want_early_init = true;
+
+	return ret;
 }
 early_param("page_owner", early_page_owner_param);
 
@@ -80,8 +85,6 @@ static __init void init_page_owner(void)
 	if (!page_owner_enabled)
 		return;
 
-	stack_depot_init();
-
 	register_dummy_stack();
 	register_failure_stack();
 	register_early_stack();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/6] mm/slub: move struct track init out of set_track()
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka

set_track() either zeroes out the struct track or fills it, depending on
the addr parameter. This is unnecessary as there's only one place that
calls it for the initialization - init_tracking(). We can simply do the
zeroing there, with a single memset() that covers both TRACK_ALLOC and
TRACK_FREE as they are adjacent.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43..1fc451f4fe62 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
 {
 	struct track *p = get_track(s, object, alloc);
 
-	if (addr) {
 #ifdef CONFIG_STACKTRACE
-		unsigned int nr_entries;
+	unsigned int nr_entries;
 
-		metadata_access_enable();
-		nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-					      TRACK_ADDRS_COUNT, 3);
-		metadata_access_disable();
+	metadata_access_enable();
+	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
+				      TRACK_ADDRS_COUNT, 3);
+	metadata_access_disable();
 
-		if (nr_entries < TRACK_ADDRS_COUNT)
-			p->addrs[nr_entries] = 0;
+	if (nr_entries < TRACK_ADDRS_COUNT)
+		p->addrs[nr_entries] = 0;
 #endif
-		p->addr = addr;
-		p->cpu = smp_processor_id();
-		p->pid = current->pid;
-		p->when = jiffies;
-	} else {
-		memset(p, 0, sizeof(struct track));
-	}
+	p->addr = addr;
+	p->cpu = smp_processor_id();
+	p->pid = current->pid;
+	p->when = jiffies;
 }
 
 static void init_tracking(struct kmem_cache *s, void *object)
 {
+	struct track *p;
+
 	if (!(s->flags & SLAB_STORE_USER))
 		return;
 
-	set_track(s, object, TRACK_FREE, 0UL);
-	set_track(s, object, TRACK_ALLOC, 0UL);
+	p = get_track(s, object, TRACK_ALLOC);
+	memset(p, 0, 2*sizeof(struct track));
 }
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-04 11:25   ` Hyeonggon Yoo
  2022-03-02 17:31 ` [PATCH v2 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Many stack traces are similar so there are many similar arrays.
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle.  Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the following patch using the stackdepot handle
instead of matching stacks manually.

[ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]

This was initially merged as commit 788691464c29 and reverted by commit
ae14c63a9f20 due to several issues, that should now be fixed.
The problem of unconditional memory overhead by stackdepot has been
addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
and stack_table allocation by kvmalloc()"), so the dependency on
stackdepot will result in extra memory usage only when a slab cache
tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
The build failures on some architectures were also addressed, and the
reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
patch.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 init/Kconfig     |  1 +
 mm/slab_common.c |  5 ++++
 mm/slub.c        | 71 +++++++++++++++++++++++++++---------------------
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..b21dd3a4a106 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1871,6 +1871,7 @@ config SLUB_DEBUG
 	default y
 	bool "Enable SLUB debugging support" if EXPERT
 	depends on SLUB && SYSFS
+	select STACKDEPOT if STACKTRACE_SUPPORT
 	help
 	  SLUB has extensive debug support features. Disabling these can
 	  result in significant savings in code size. This also disables
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23f2ab0713b7..e51d50d03000 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -24,6 +24,7 @@
 #include <asm/tlbflush.h>
 #include <asm/page.h>
 #include <linux/memcontrol.h>
+#include <linux/stackdepot.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/kmem.h>
@@ -314,9 +315,13 @@ kmem_cache_create_usercopy(const char *name,
 	 * If no slub_debug was enabled globally, the static key is not yet
 	 * enabled by setup_slub_debug(). Enable it if the cache is being
 	 * created with any of the debugging flags passed explicitly.
+	 * It's also possible that this is the first cache created with
+	 * SLAB_STORE_USER and we should init stack_depot for it.
 	 */
 	if (flags & SLAB_DEBUG_FLAGS)
 		static_branch_enable(&slub_debug_enabled);
+	if (flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+		stack_depot_init();
 #endif
 
 	mutex_lock(&slab_mutex);
diff --git a/mm/slub.c b/mm/slub.c
index 1fc451f4fe62..42cb79af70a0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -26,6 +26,7 @@
 #include <linux/cpuset.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
+#include <linux/stackdepot.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/kfence.h>
@@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
 #define TRACK_ADDRS_COUNT 16
 struct track {
 	unsigned long addr;	/* Called from address */
-#ifdef CONFIG_STACKTRACE
-	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
+#ifdef CONFIG_STACKDEPOT
+	depot_stack_handle_t handle;
 #endif
 	int cpu;		/* Was running on cpu */
 	int pid;		/* Pid context */
@@ -724,22 +725,19 @@ static struct track *get_track(struct kmem_cache *s, void *object,
 	return kasan_reset_tag(p + alloc);
 }
 
-static void set_track(struct kmem_cache *s, void *object,
+static void noinline set_track(struct kmem_cache *s, void *object,
 			enum track_item alloc, unsigned long addr)
 {
 	struct track *p = get_track(s, object, alloc);
 
-#ifdef CONFIG_STACKTRACE
+#ifdef CONFIG_STACKDEPOT
+	unsigned long entries[TRACK_ADDRS_COUNT];
 	unsigned int nr_entries;
 
-	metadata_access_enable();
-	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-				      TRACK_ADDRS_COUNT, 3);
-	metadata_access_disable();
-
-	if (nr_entries < TRACK_ADDRS_COUNT)
-		p->addrs[nr_entries] = 0;
+	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
+	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
 #endif
+
 	p->addr = addr;
 	p->cpu = smp_processor_id();
 	p->pid = current->pid;
@@ -759,20 +757,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
 {
+	depot_stack_handle_t handle __maybe_unused;
+
 	if (!t->addr)
 		return;
 
 	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
 	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
-#ifdef CONFIG_STACKTRACE
-	{
-		int i;
-		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
-			if (t->addrs[i])
-				pr_err("\t%pS\n", (void *)t->addrs[i]);
-			else
-				break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(t->handle);
+	if (handle)
+		stack_depot_print(handle);
+	else
+		pr_err("object allocation/free stack trace missing\n");
 #endif
 }
 
@@ -1532,6 +1529,8 @@ static int __init setup_slub_debug(char *str)
 			global_slub_debug_changed = true;
 		} else {
 			slab_list_specified = true;
+			if (flags & SLAB_STORE_USER)
+				stack_depot_want_early_init = true;
 		}
 	}
 
@@ -1549,6 +1548,8 @@ static int __init setup_slub_debug(char *str)
 	}
 out:
 	slub_debug = global_flags;
+	if (slub_debug & SLAB_STORE_USER)
+		stack_depot_want_early_init = true;
 	if (slub_debug != 0 || slub_debug_string)
 		static_branch_enable(&slub_debug_enabled);
 	else
@@ -4352,18 +4353,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 	objp = fixup_red_left(s, objp);
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	kpp->kp_ret = (void *)trackp->addr;
-#ifdef CONFIG_STACKTRACE
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_stack[i])
-			break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	{
+		depot_stack_handle_t handle;
+		unsigned long *entries;
+		unsigned int nr_entries;
+
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_stack[i] = (void *)entries[i];
+		}
 
-	trackp = get_track(s, objp, TRACK_FREE);
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_free_stack[i])
-			break;
+		trackp = get_track(s, objp, TRACK_FREE);
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_free_stack[i] = (void *)entries[i];
+		}
 	}
 #endif
 #endif
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/6] mm/slub: distinguish and print stack traces in debugfs files
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (2 preceding siblings ...)
  2022-03-02 17:31 ` [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
  5 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Aggregate objects in slub cache by unique stack trace in addition to
caller address when producing contents of debugfs files alloc_traces and
free_traces in debugfs. Also add the stack traces to the debugfs output.
This makes it much more useful to e.g. debug memory leaks.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 42cb79af70a0..25138cd7a4ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5074,6 +5074,7 @@ EXPORT_SYMBOL(validate_slab_cache);
  */
 
 struct location {
+	depot_stack_handle_t handle;
 	unsigned long count;
 	unsigned long addr;
 	long long sum_time;
@@ -5126,9 +5127,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 {
 	long start, end, pos;
 	struct location *l;
-	unsigned long caddr;
+	unsigned long caddr, chandle;
 	unsigned long age = jiffies - track->when;
+	depot_stack_handle_t handle = 0;
 
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(track->handle);
+#endif
 	start = -1;
 	end = t->count;
 
@@ -5143,7 +5148,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 			break;
 
 		caddr = t->loc[pos].addr;
-		if (track->addr == caddr) {
+		chandle = t->loc[pos].handle;
+		if ((track->addr == caddr) && (handle == chandle)) {
 
 			l = &t->loc[pos];
 			l->count++;
@@ -5168,6 +5174,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 
 		if (track->addr < caddr)
 			end = pos;
+		else if (track->addr == caddr && handle < chandle)
+			end = pos;
 		else
 			start = pos;
 	}
@@ -5190,6 +5198,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 	l->max_time = age;
 	l->min_pid = track->pid;
 	l->max_pid = track->pid;
+	l->handle = handle;
 	cpumask_clear(to_cpumask(l->cpus));
 	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
 	nodes_clear(l->nodes);
@@ -6101,6 +6110,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
 			seq_printf(seq, " nodes=%*pbl",
 				 nodemask_pr_args(&l->nodes));
 
+#ifdef CONFIG_STACKDEPOT
+		{
+			depot_stack_handle_t handle;
+			unsigned long *entries;
+			unsigned int nr_entries, j;
+
+			handle = READ_ONCE(l->handle);
+			if (handle) {
+				nr_entries = stack_depot_fetch(handle, &entries);
+				seq_puts(seq, "\n");
+				for (j = 0; j < nr_entries; j++)
+					seq_printf(seq, "        %pS\n", (void *)entries[j]);
+			}
+		}
+#endif
 		seq_puts(seq, "\n");
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/6] mm/slub: sort debugfs output by frequency of stack traces
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (3 preceding siblings ...)
  2022-03-02 17:31 ` [PATCH v2 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-02 17:31 ` [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
  5 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Sort the output of debugfs alloc_traces and free_traces by the frequency
of allocation/freeing stack traces. Most frequently used stack traces
will be printed first, e.g. for easier memory leak debugging.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 25138cd7a4ef..39da762fbceb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -38,6 +38,7 @@
 #include <linux/memcontrol.h>
 #include <linux/random.h>
 #include <kunit/test.h>
+#include <linux/sort.h>
 
 #include <linux/debugfs.h>
 #include <trace/events/kmem.h>
@@ -6149,6 +6150,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
 	return NULL;
 }
 
+static int cmp_loc_by_count(const void *a, const void *b, const void *data)
+{
+	struct location *loc1 = (struct location *)a;
+	struct location *loc2 = (struct location *)b;
+
+	if (loc1->count > loc2->count)
+		return -1;
+	else
+		return 1;
+}
+
 static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
 {
 	struct loc_track *t = seq->private;
@@ -6210,6 +6222,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
+	/* Sort locations by count */
+	sort_r(t->loc, t->count, sizeof(struct location),
+		cmp_loc_by_count, NULL, NULL);
+
 	bitmap_free(obj_map);
 	return 0;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches
  2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (4 preceding siblings ...)
  2022-03-02 17:31 ` [PATCH v2 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
@ 2022-03-02 17:31 ` Vlastimil Babka
  2022-03-03  8:14   ` Hyeonggon Yoo
  2022-03-03  9:33   ` Mike Rapoport
  5 siblings, 2 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 17:31 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan, Vlastimil Babka, Jonathan Corbet, Randy Dunlap,
	linux-doc

From: Oliver Glitta <glittao@gmail.com>

Add description of debugfs files alloc_traces and free_traces
to SLUB cache documentation.

[ vbabka@suse.cz: some rewording ]

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
---
 Documentation/vm/slub.rst | 64 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index d3028554b1e9..43063ade737a 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -384,5 +384,69 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
       40,60`` range will plot only samples collected between 40th and
       60th seconds).
 
+
+DebugFS files for SLUB
+======================
+
+For more information about current state of SLUB caches with the user tracking
+debug option enabled, debugfs files are available, typically under
+/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
+tracking). There are 2 types of these files with the following debug
+information:
+
+1. alloc_traces::
+
+    Prints information about unique allocation traces of the currently
+    allocated objects. The output is sorted by frequency of each trace.
+
+    Information in the output:
+    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
+    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
+
+    Example:::
+
+    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
+	__slab_alloc+0x6d/0x90
+	kmem_cache_alloc_trace+0x2eb/0x300
+	populate_error_injection_list+0x97/0x110
+	init_error_injection+0x1b/0x71
+	do_one_initcall+0x5f/0x2d0
+	kernel_init_freeable+0x26f/0x2d7
+	kernel_init+0xe/0x118
+	ret_from_fork+0x22/0x30
+
+
+2. free_traces::
+
+    Prints information about unique freeing traces of the currently allocated
+    objects. The freeing traces thus come from the previous life-cycle of the
+    objects and are reported as not available for objects allocated for the first
+    time. The output is sorted by frequency of each trace.
+
+    Information in the output:
+    Number of objects, freeing function, minimal/average/maximal jiffies since free,
+    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
+
+    Example:::
+
+    1980 <not-available> age=4294912290 pid=0 cpus=0
+    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
+	kfree+0x2db/0x420
+	acpi_ut_update_ref_count+0x6a6/0x782
+	acpi_ut_update_object_reference+0x1ad/0x234
+	acpi_ut_remove_reference+0x7d/0x84
+	acpi_rs_get_prt_method_data+0x97/0xd6
+	acpi_get_irq_routing_table+0x82/0xc4
+	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
+	acpi_pci_irq_lookup+0x3a/0x1e0
+	acpi_pci_irq_enable+0x77/0x240
+	pcibios_enable_device+0x39/0x40
+	do_pci_enable_device.part.0+0x5d/0xe0
+	pci_enable_device_flags+0xfc/0x120
+	pci_enable_device+0x13/0x20
+	virtio_pci_probe+0x9e/0x170
+	local_pci_probe+0x48/0x80
+	pci_device_probe+0x105/0x1c0
+
 Christoph Lameter, May 30, 2007
 Sergey Senozhatsky, October 23, 2015
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
@ 2022-03-02 17:47   ` Marco Elver
  2022-03-02 18:02     ` Vlastimil Babka
  2022-03-02 18:01   ` Mike Rapoport
  2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
  2 siblings, 1 reply; 19+ messages in thread
From: Marco Elver @ 2022-03-02 17:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan

On Wed, Mar 02, 2022 at 06:31PM +0100, Vlastimil Babka wrote:
> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
> 
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parmeters (which happens early enough)
> can set a new variable stack_depot_want_early_init as true.

Agree, I think this is the best solution.

> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
> 
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> 
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/stackdepot.h | 16 ++++++++++++++--
>  lib/stackdepot.c           |  2 ++
>  mm/page_owner.c            |  9 ++++++---
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..1217ba2b636e 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -15,6 +15,8 @@
>  
>  typedef u32 depot_stack_handle_t;
>  
> +extern bool stack_depot_want_early_init;
> +
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					unsigned int nr_entries,
>  					gfp_t gfp_flags, bool can_alloc);
> @@ -26,11 +28,21 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to set stack_depot_want_early_init as true, when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the call to stack_depot_want_early_init().
>   */
>  int stack_depot_init(void);

I think for stack_depot_init() it might now be convenient to provide a
no-op version automatically if !STACKDEPOT, which would avoid
some 'if (.. && IS_ENABLED(CONFIG_STACKDEPOT))' in a later patch.

Similarly, for stack_depot_want_early_init, where instead you could
simply provide stack_depot_want_early_init() as a function, which simply
sets a boolean __stack_depot_want_early_init. If !STACKDEPOT, it'll also
just be a no-op function.

>  
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> +#ifdef CONFIG_STACKDEPOT
> +static inline int stack_depot_early_init(void)
> +{
> +	if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT)
> +	    || stack_depot_want_early_init)
> +		return stack_depot_init();
> +	return 0;
> +}
>  #else
>  static inline int stack_depot_early_init(void)	{ return 0; }
>  #endif
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..02e2b5fcbf3b 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,8 @@ struct stack_record {
>  	unsigned long entries[];	/* Variable-sized array of entries. */
>  };
>  
> +bool stack_depot_want_early_init = false;
> +

This can be __initdata, right?

>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>  
>  static int depot_index;
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 99e360df9465..40dce2b81d13 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>  
>  static int __init early_page_owner_param(char *buf)
>  {
> -	return kstrtobool(buf, &page_owner_enabled);
> +	int ret = kstrtobool(buf, &page_owner_enabled);
> +
> +	if (page_owner_enabled)
> +		stack_depot_want_early_init = true;
> +
> +	return ret;
>  }
>  early_param("page_owner", early_page_owner_param);
>  
> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>  	if (!page_owner_enabled)
>  		return;
>  
> -	stack_depot_init();
> -
>  	register_dummy_stack();
>  	register_failure_stack();
>  	register_early_stack();
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
  2022-03-02 17:47   ` Marco Elver
@ 2022-03-02 18:01   ` Mike Rapoport
  2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
  2 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2022-03-02 18:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Hyeonggon Yoo,
	Imran Khan

On Wed, Mar 02, 2022 at 06:31:17PM +0100, Vlastimil Babka wrote:
> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
> 
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parmeters (which happens early enough)
> can set a new variable stack_depot_want_early_init as true.
> 
> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
> 
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> 
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/stackdepot.h | 16 ++++++++++++++--
>  lib/stackdepot.c           |  2 ++
>  mm/page_owner.c            |  9 ++++++---
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..1217ba2b636e 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -15,6 +15,8 @@
>  
>  typedef u32 depot_stack_handle_t;
>  
> +extern bool stack_depot_want_early_init;
> +
>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					unsigned int nr_entries,
>  					gfp_t gfp_flags, bool can_alloc);
> @@ -26,11 +28,21 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to set stack_depot_want_early_init as true, when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the call to stack_depot_want_early_init().
>   */
>  int stack_depot_init(void);
>  
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> +#ifdef CONFIG_STACKDEPOT
> +static inline int stack_depot_early_init(void)
> +{
> +	if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT)
> +	    || stack_depot_want_early_init)
> +		return stack_depot_init();
> +	return 0;
> +}

I'd also suggest splitting memblock allocation from stack_depot_init() to
stack_depot_early_init().

>  #else
>  static inline int stack_depot_early_init(void)	{ return 0; }
>  #endif
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..02e2b5fcbf3b 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,8 @@ struct stack_record {
>  	unsigned long entries[];	/* Variable-sized array of entries. */
>  };
>  
> +bool stack_depot_want_early_init = false;
> +
>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>  
>  static int depot_index;
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 99e360df9465..40dce2b81d13 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>  
>  static int __init early_page_owner_param(char *buf)
>  {
> -	return kstrtobool(buf, &page_owner_enabled);
> +	int ret = kstrtobool(buf, &page_owner_enabled);
> +
> +	if (page_owner_enabled)
> +		stack_depot_want_early_init = true;
> +
> +	return ret;
>  }
>  early_param("page_owner", early_page_owner_param);
>  
> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>  	if (!page_owner_enabled)
>  		return;
>  
> -	stack_depot_init();
> -
>  	register_dummy_stack();
>  	register_failure_stack();
>  	register_early_stack();
> -- 
> 2.35.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 17:47   ` Marco Elver
@ 2022-03-02 18:02     ` Vlastimil Babka
  2022-03-02 18:15       ` Marco Elver
  0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-02 18:02 UTC (permalink / raw)
  To: Marco Elver
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan

On 3/2/22 18:47, Marco Elver wrote:
> On Wed, Mar 02, 2022 at 06:31PM +0100, Vlastimil Babka wrote:
>> In a later patch we want to add stackdepot support for object owner
>> tracking in slub caches, which is enabled by slub_debug boot parameter.
>> This creates a bootstrap problem as some caches are created early in
>> boot when slab_is_available() is false and thus stack_depot_init()
>> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
>> already beyond memblock_free_all(). Ideally memblock allocation should
>> fail, yet it succeeds, but later the system crashes, which is a
>> separately handled issue.
>> 
>> To resolve this boostrap issue in a robust way, this patch adds another
>> way to request stack_depot_early_init(), which happens at a well-defined
>> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
>> code that's e.g. processing boot parmeters (which happens early enough)
>> can set a new variable stack_depot_want_early_init as true.
> 
> Agree, I think this is the best solution.

Thanks.

>> In this patch we also convert page_owner to this approach. While it
>> doesn't have the bootstrap issue as slub, it's also a functionality
>> enabled by a boot param and can thus request stack_depot_early_init()
>> with memblock allocation instead of later initialization with
>> kvmalloc().
>> 
>> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
>> 
>> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  include/linux/stackdepot.h | 16 ++++++++++++++--
>>  lib/stackdepot.c           |  2 ++
>>  mm/page_owner.c            |  9 ++++++---
>>  3 files changed, 22 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
>> index 17f992fe6355..1217ba2b636e 100644
>> --- a/include/linux/stackdepot.h
>> +++ b/include/linux/stackdepot.h
>> @@ -15,6 +15,8 @@
>>  
>>  typedef u32 depot_stack_handle_t;
>>  
>> +extern bool stack_depot_want_early_init;
>> +
>>  depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>>  					unsigned int nr_entries,
>>  					gfp_t gfp_flags, bool can_alloc);
>> @@ -26,11 +28,21 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>>   * that stack depot will be used.
>> + *
>> + * Another alternative is to set stack_depot_want_early_init as true, when the
>> + * decision to use stack depot is taken e.g. when evaluating kernel boot
>> + * parameters, which precedes the call to stack_depot_want_early_init().
>>   */
>>  int stack_depot_init(void);
> 
> I think for stack_depot_init() it might now be convenient to provide a
> no-op version automatically if !STACKDEPOT, which would avoid
> some 'if (.. && IS_ENABLED(CONFIG_STACKDEPOT))' in a later patch.
> 
> Similarly, for stack_depot_want_early_init, where instead you could
> simply provide stack_depot_want_early_init() as a function, which simply
> sets a boolean __stack_depot_want_early_init. If !STACKDEPOT, it'll also
> just be a no-op function.

Yeah, makes sense. I guess I have patch 3/6 wrong now anyway as with
!STACKDEPOT it should fail linking due to missing stack_depot_want_early_init...

>>  
>> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
>> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
>> +#ifdef CONFIG_STACKDEPOT
>> +static inline int stack_depot_early_init(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT)
>> +	    || stack_depot_want_early_init)
>> +		return stack_depot_init();
>> +	return 0;
>> +}
>>  #else
>>  static inline int stack_depot_early_init(void)	{ return 0; }
>>  #endif
>> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
>> index bf5ba9af0500..02e2b5fcbf3b 100644
>> --- a/lib/stackdepot.c
>> +++ b/lib/stackdepot.c
>> @@ -66,6 +66,8 @@ struct stack_record {
>>  	unsigned long entries[];	/* Variable-sized array of entries. */
>>  };
>>  
>> +bool stack_depot_want_early_init = false;
>> +
> 
> This can be __initdata, right?
 
I initially thought so too, but in include/linux/init.h found
 * Don't forget to initialize data not at file scope, i.e. within a function,
 * as gcc otherwise puts the data into the bss section and not into the init
 * section.
But maybe that's just outdated as everyone seems to init them at file scope.

>>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>>  
>>  static int depot_index;
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index 99e360df9465..40dce2b81d13 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>>  
>>  static int __init early_page_owner_param(char *buf)
>>  {
>> -	return kstrtobool(buf, &page_owner_enabled);
>> +	int ret = kstrtobool(buf, &page_owner_enabled);
>> +
>> +	if (page_owner_enabled)
>> +		stack_depot_want_early_init = true;
>> +
>> +	return ret;
>>  }
>>  early_param("page_owner", early_page_owner_param);
>>  
>> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>>  	if (!page_owner_enabled)
>>  		return;
>>  
>> -	stack_depot_init();
>> -
>>  	register_dummy_stack();
>>  	register_failure_stack();
>>  	register_early_stack();
>> -- 
>> 2.35.1
>> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 18:02     ` Vlastimil Babka
@ 2022-03-02 18:15       ` Marco Elver
  0 siblings, 0 replies; 19+ messages in thread
From: Marco Elver @ 2022-03-02 18:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan

On Wed, 2 Mar 2022 at 19:02, Vlastimil Babka <vbabka@suse.cz> wrote:
[...]
> > Similarly, for stack_depot_want_early_init, where instead you could
> > simply provide stack_depot_want_early_init() as a function, which simply
> > sets a boolean __stack_depot_want_early_init. If !STACKDEPOT, it'll also
> > just be a no-op function.
>
> Yeah, makes sense. I guess I have patch 3/6 wrong now anyway as with
> !STACKDEPOT it should fail linking due to missing stack_depot_want_early_init...

Right. It probably still worked because the compiler likely optimizes
out the dead call, but you never know...

> >> +bool stack_depot_want_early_init = false;
> >> +
> >
> > This can be __initdata, right?
>
> I initially thought so too, but in include/linux/init.h found
>  * Don't forget to initialize data not at file scope, i.e. within a function,
>  * as gcc otherwise puts the data into the bss section and not into the init
>  * section.
> But maybe that's just outdated as everyone seems to init them at file scope.

I think that comment is just about static variables inside functions?
Here it's at file scope, so that caveat shouldn't apply. As an aside,
you could omit '= false' because it'd zero-init by default.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches
  2022-03-02 17:31 ` [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
@ 2022-03-03  8:14   ` Hyeonggon Yoo
  2022-03-03  9:33   ` Mike Rapoport
  1 sibling, 0 replies; 19+ messages in thread
From: Hyeonggon Yoo @ 2022-03-03  8:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Mike Rapoport,
	Imran Khan, Jonathan Corbet, Randy Dunlap, linux-doc

On Wed, Mar 02, 2022 at 06:31:22PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Add description of debugfs files alloc_traces and free_traces
> to SLUB cache documentation.
> 
> [ vbabka@suse.cz: some rewording ]
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-doc@vger.kernel.org
> ---
>  Documentation/vm/slub.rst | 64 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index d3028554b1e9..43063ade737a 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -384,5 +384,69 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
>        40,60`` range will plot only samples collected between 40th and
>        60th seconds).
>  
> +
> +DebugFS files for SLUB
> +======================
> +
> +For more information about current state of SLUB caches with the user tracking
> +debug option enabled, debugfs files are available, typically under
> +/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
> +tracking). There are 2 types of these files with the following debug
> +information:
> +
> +1. alloc_traces::
> +
> +    Prints information about unique allocation traces of the currently
> +    allocated objects. The output is sorted by frequency of each trace.
> +
> +    Information in the output:
> +    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
> +    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
> +
> +    Example:::
> +
> +    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
> +	__slab_alloc+0x6d/0x90
> +	kmem_cache_alloc_trace+0x2eb/0x300
> +	populate_error_injection_list+0x97/0x110
> +	init_error_injection+0x1b/0x71
> +	do_one_initcall+0x5f/0x2d0
> +	kernel_init_freeable+0x26f/0x2d7
> +	kernel_init+0xe/0x118
> +	ret_from_fork+0x22/0x30
> +
> +
> +2. free_traces::
> +
> +    Prints information about unique freeing traces of the currently allocated
> +    objects. The freeing traces thus come from the previous life-cycle of the
> +    objects and are reported as not available for objects allocated for the first
> +    time. The output is sorted by frequency of each trace.
> +
> +    Information in the output:
> +    Number of objects, freeing function, minimal/average/maximal jiffies since free,
> +    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
> +
> +    Example:::
> +
> +    1980 <not-available> age=4294912290 pid=0 cpus=0
> +    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
> +	kfree+0x2db/0x420
> +	acpi_ut_update_ref_count+0x6a6/0x782
> +	acpi_ut_update_object_reference+0x1ad/0x234
> +	acpi_ut_remove_reference+0x7d/0x84
> +	acpi_rs_get_prt_method_data+0x97/0xd6
> +	acpi_get_irq_routing_table+0x82/0xc4
> +	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
> +	acpi_pci_irq_lookup+0x3a/0x1e0
> +	acpi_pci_irq_enable+0x77/0x240
> +	pcibios_enable_device+0x39/0x40
> +	do_pci_enable_device.part.0+0x5d/0xe0
> +	pci_enable_device_flags+0xfc/0x120
> +	pci_enable_device+0x13/0x20
> +	virtio_pci_probe+0x9e/0x170
> +	local_pci_probe+0x48/0x80
> +	pci_device_probe+0x105/0x1c0
> +

Looks very good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

>  Christoph Lameter, May 30, 2007
>  Sergey Senozhatsky, October 23, 2015
> -- 
> 2.35.1
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches
  2022-03-02 17:31 ` [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
  2022-03-03  8:14   ` Hyeonggon Yoo
@ 2022-03-03  9:33   ` Mike Rapoport
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2022-03-03  9:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Hyeonggon Yoo,
	Imran Khan, Jonathan Corbet, Randy Dunlap, linux-doc

On Wed, Mar 02, 2022 at 06:31:22PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Add description of debugfs files alloc_traces and free_traces
> to SLUB cache documentation.
> 
> [ vbabka@suse.cz: some rewording ]
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-doc@vger.kernel.org

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  Documentation/vm/slub.rst | 64 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index d3028554b1e9..43063ade737a 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -384,5 +384,69 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
>        40,60`` range will plot only samples collected between 40th and
>        60th seconds).
>  
> +
> +DebugFS files for SLUB
> +======================
> +
> +For more information about current state of SLUB caches with the user tracking
> +debug option enabled, debugfs files are available, typically under
> +/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
> +tracking). There are 2 types of these files with the following debug
> +information:
> +
> +1. alloc_traces::
> +
> +    Prints information about unique allocation traces of the currently
> +    allocated objects. The output is sorted by frequency of each trace.
> +
> +    Information in the output:
> +    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
> +    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
> +
> +    Example:::
> +
> +    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
> +	__slab_alloc+0x6d/0x90
> +	kmem_cache_alloc_trace+0x2eb/0x300
> +	populate_error_injection_list+0x97/0x110
> +	init_error_injection+0x1b/0x71
> +	do_one_initcall+0x5f/0x2d0
> +	kernel_init_freeable+0x26f/0x2d7
> +	kernel_init+0xe/0x118
> +	ret_from_fork+0x22/0x30
> +
> +
> +2. free_traces::
> +
> +    Prints information about unique freeing traces of the currently allocated
> +    objects. The freeing traces thus come from the previous life-cycle of the
> +    objects and are reported as not available for objects allocated for the first
> +    time. The output is sorted by frequency of each trace.
> +
> +    Information in the output:
> +    Number of objects, freeing function, minimal/average/maximal jiffies since free,
> +    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
> +
> +    Example:::
> +
> +    1980 <not-available> age=4294912290 pid=0 cpus=0
> +    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
> +	kfree+0x2db/0x420
> +	acpi_ut_update_ref_count+0x6a6/0x782
> +	acpi_ut_update_object_reference+0x1ad/0x234
> +	acpi_ut_remove_reference+0x7d/0x84
> +	acpi_rs_get_prt_method_data+0x97/0xd6
> +	acpi_get_irq_routing_table+0x82/0xc4
> +	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
> +	acpi_pci_irq_lookup+0x3a/0x1e0
> +	acpi_pci_irq_enable+0x77/0x240
> +	pcibios_enable_device+0x39/0x40
> +	do_pci_enable_device.part.0+0x5d/0xe0
> +	pci_enable_device_flags+0xfc/0x120
> +	pci_enable_device+0x13/0x20
> +	virtio_pci_probe+0x9e/0x170
> +	local_pci_probe+0x48/0x80
> +	pci_device_probe+0x105/0x1c0
> +
>  Christoph Lameter, May 30, 2007
>  Sergey Senozhatsky, October 23, 2015
> -- 
> 2.35.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3r0 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
  2022-03-02 17:47   ` Marco Elver
  2022-03-02 18:01   ` Mike Rapoport
@ 2022-03-03 19:19   ` Vlastimil Babka
  2022-03-04  9:18     ` Marco Elver
                       ` (2 more replies)
  2 siblings, 3 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-03 19:19 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Marco Elver, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan

Here's an updated version based on feedback so I don't re-send everything
so soon after v2. Also in git:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v3r0
There are only trivial adaptations in patch 3/6 to the stack depot init
changes otherwise.
Thanks, Vlastimil

----8<----
From 230ebae0f83540574d167f9ba1f71d3f602ca410 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 2 Mar 2022 12:02:22 +0100
Subject: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization
 dynamically

In a later patch we want to add stackdepot support for object owner
tracking in slub caches, which is enabled by slub_debug boot parameter.
This creates a bootstrap problem as some caches are created early in
boot when slab_is_available() is false and thus stack_depot_init()
tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
already beyond memblock_free_all(). Ideally memblock allocation should
fail, yet it succeeds, but later the system crashes, which is a
separately handled issue.

To resolve this boostrap issue in a robust way, this patch adds another
way to request stack_depot_early_init(), which happens at a well-defined
point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
code that's e.g. processing boot parameters (which happens early enough)
can call a new function stack_depot_want_early_init(), which sets a flag
that stack_depot_early_init() will check.

In this patch we also convert page_owner to this approach. While it
doesn't have the bootstrap issue as slub, it's also a functionality
enabled by a boot param and can thus request stack_depot_early_init()
with memblock allocation instead of later initialization with
kvmalloc().

As suggested by Mike, make stack_depot_early_init() only attempt
memblock allocation and stack_depot_init() only attempt kvmalloc().
Also change the latter to kvcalloc(). In both cases we can lose the
explicit array zeroing, which the allocations do already.

As suggested by Marco, provide empty implementations of the init
functions for !CONFIG_STACKDEPOT builds to simplify the callers.

[1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/

Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/stackdepot.h | 26 ++++++++++++---
 lib/stackdepot.c           | 67 +++++++++++++++++++++++++-------------
 mm/page_owner.c            |  9 +++--
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 17f992fe6355..bc2797955de9 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 					gfp_t gfp_flags, bool can_alloc);
 
 /*
- * Every user of stack depot has to call this during its own init when it's
- * decided that it will be calling stack_depot_save() later.
+ * Every user of stack depot has to call stack_depot_init() during its own init
+ * when it's decided that it will be calling stack_depot_save() later. This is
+ * recommended for e.g. modules initialized later in the boot process, when
+ * slab_is_available() is true.
  *
  * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
  * enabled as part of mm_init(), for subsystems where it's known at compile time
  * that stack depot will be used.
+ *
+ * Another alternative is to call stack_depot_want_early_init(), when the
+ * decision to use stack depot is taken e.g. when evaluating kernel boot
+ * parameters, which precedes the enablement point in mm_init().
+ *
+ * stack_depot_init() and stack_depot_want_early_init() can be called regardless
+ * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
+ * functions should only be called from code that makes sure CONFIG_STACKDEPOT
+ * is enabled.
  */
+#ifdef CONFIG_STACKDEPOT
 int stack_depot_init(void);
 
-#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
-static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
+void __init stack_depot_want_early_init(void);
+
+/* This is supposed to be called only from mm_init() */
+int __init stack_depot_early_init(void);
 #else
+static inline int stack_depot_init(void) { return 0; }
+
+static inline void stack_depot_want_early_init(void) { }
+
 static inline int stack_depot_early_init(void)	{ return 0; }
 #endif
 
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..0a5916f1e7a3 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -66,6 +66,9 @@ struct stack_record {
 	unsigned long entries[];	/* Variable-sized array of entries. */
 };
 
+static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
+static bool __stack_depot_early_init_passed __initdata;
+
 static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
 
 static int depot_index;
@@ -162,38 +165,58 @@ static int __init is_stack_depot_disabled(char *str)
 }
 early_param("stack_depot_disable", is_stack_depot_disabled);
 
-/*
- * __ref because of memblock_alloc(), which will not be actually called after
- * the __init code is gone, because at that point slab_is_available() is true
- */
-__ref int stack_depot_init(void)
+void __init stack_depot_want_early_init(void)
+{
+	/* Too late to request early init now */
+	WARN_ON(__stack_depot_early_init_passed);
+
+	__stack_depot_want_early_init = true;
+}
+
+int __init stack_depot_early_init(void)
+{
+	size_t size;
+	int i;
+
+	/* This is supposed to be called only once, from mm_init() */
+	if (WARN_ON(__stack_depot_early_init_passed))
+		return 0;
+
+	__stack_depot_early_init_passed = true;
+
+	if (!__stack_depot_want_early_init || stack_depot_disable)
+		return 0;
+
+	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
+	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+
+	if (!stack_table) {
+		pr_err("Stack Depot hash table allocation failed, disabling\n");
+		stack_depot_disable = true;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int stack_depot_init(void)
 {
 	static DEFINE_MUTEX(stack_depot_init_mutex);
+	int ret = 0;
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
-		int i;
-
-		if (slab_is_available()) {
-			pr_info("Stack Depot allocating hash table with kvmalloc\n");
-			stack_table = kvmalloc(size, GFP_KERNEL);
-		} else {
-			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
-			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
-		}
-		if (stack_table) {
-			for (i = 0; i < STACK_HASH_SIZE;  i++)
-				stack_table[i] = NULL;
-		} else {
+		pr_info("Stack Depot allocating hash table with kvcalloc\n");
+		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
+		if (!stack_table) {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
 			stack_depot_disable = true;
-			mutex_unlock(&stack_depot_init_mutex);
-			return -ENOMEM;
+			ret = -ENOMEM;
 		}
 	}
 	mutex_unlock(&stack_depot_init_mutex);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 99e360df9465..4313f8212a83 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
 
 static int __init early_page_owner_param(char *buf)
 {
-	return kstrtobool(buf, &page_owner_enabled);
+	int ret = kstrtobool(buf, &page_owner_enabled);
+
+	if (page_owner_enabled)
+		stack_depot_want_early_init();
+
+	return ret;
 }
 early_param("page_owner", early_page_owner_param);
 
@@ -80,8 +85,6 @@ static __init void init_page_owner(void)
 	if (!page_owner_enabled)
 		return;
 
-	stack_depot_init();
-
 	register_dummy_stack();
 	register_failure_stack();
 	register_early_stack();
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3r0 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
@ 2022-03-04  9:18     ` Marco Elver
  2022-03-04 10:47     ` Hyeonggon Yoo
  2022-03-04 11:02     ` Mike Rapoport
  2 siblings, 0 replies; 19+ messages in thread
From: Marco Elver @ 2022-03-04  9:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Mike Rapoport, Hyeonggon Yoo,
	Imran Khan

On Thu, 3 Mar 2022 at 20:19, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Here's an updated version based on feedback so I don't re-send everything
> so soon after v2. Also in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v3r0
> There are only trivial adaptations in patch 3/6 to the stack depot init
> changes otherwise.
> Thanks, Vlastimil
>
> ----8<----
> From 230ebae0f83540574d167f9ba1f71d3f602ca410 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 2 Mar 2022 12:02:22 +0100
> Subject: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization
>  dynamically
>
> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
>
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parameters (which happens early enough)
> can call a new function stack_depot_want_early_init(), which sets a flag
> that stack_depot_early_init() will check.
>
> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
>
> As suggested by Mike, make stack_depot_early_init() only attempt
> memblock allocation and stack_depot_init() only attempt kvmalloc().
> Also change the latter to kvcalloc(). In both cases we can lose the
> explicit array zeroing, which the allocations do already.
>
> As suggested by Marco, provide empty implementations of the init
> functions for !CONFIG_STACKDEPOT builds to simplify the callers.
>
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
>
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Looks good, thanks!

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  include/linux/stackdepot.h | 26 ++++++++++++---
>  lib/stackdepot.c           | 67 +++++++++++++++++++++++++-------------
>  mm/page_owner.c            |  9 +++--
>  3 files changed, 73 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..bc2797955de9 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                                         gfp_t gfp_flags, bool can_alloc);
>
>  /*
> - * Every user of stack depot has to call this during its own init when it's
> - * decided that it will be calling stack_depot_save() later.
> + * Every user of stack depot has to call stack_depot_init() during its own init
> + * when it's decided that it will be calling stack_depot_save() later. This is
> + * recommended for e.g. modules initialized later in the boot process, when
> + * slab_is_available() is true.
>   *
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to call stack_depot_want_early_init(), when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the enablement point in mm_init().
> + *
> + * stack_depot_init() and stack_depot_want_early_init() can be called regardless
> + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
> + * functions should only be called from code that makes sure CONFIG_STACKDEPOT
> + * is enabled.
>   */
> +#ifdef CONFIG_STACKDEPOT
>  int stack_depot_init(void);
>
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void) { return stack_depot_init(); }
> +void __init stack_depot_want_early_init(void);
> +
> +/* This is supposed to be called only from mm_init() */
> +int __init stack_depot_early_init(void);
>  #else
> +static inline int stack_depot_init(void) { return 0; }
> +
> +static inline void stack_depot_want_early_init(void) { }
> +
>  static inline int stack_depot_early_init(void) { return 0; }
>  #endif
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..0a5916f1e7a3 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,9 @@ struct stack_record {
>         unsigned long entries[];        /* Variable-sized array of entries. */
>  };
>
> +static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
> +static bool __stack_depot_early_init_passed __initdata;
> +
>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>
>  static int depot_index;
> @@ -162,38 +165,58 @@ static int __init is_stack_depot_disabled(char *str)
>  }
>  early_param("stack_depot_disable", is_stack_depot_disabled);
>
> -/*
> - * __ref because of memblock_alloc(), which will not be actually called after
> - * the __init code is gone, because at that point slab_is_available() is true
> - */
> -__ref int stack_depot_init(void)
> +void __init stack_depot_want_early_init(void)
> +{
> +       /* Too late to request early init now */
> +       WARN_ON(__stack_depot_early_init_passed);
> +
> +       __stack_depot_want_early_init = true;
> +}
> +
> +int __init stack_depot_early_init(void)
> +{
> +       size_t size;
> +       int i;
> +
> +       /* This is supposed to be called only once, from mm_init() */
> +       if (WARN_ON(__stack_depot_early_init_passed))
> +               return 0;
> +
> +       __stack_depot_early_init_passed = true;
> +
> +       if (!__stack_depot_want_early_init || stack_depot_disable)
> +               return 0;
> +
> +       pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
> +       size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> +       stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> +
> +       if (!stack_table) {
> +               pr_err("Stack Depot hash table allocation failed, disabling\n");
> +               stack_depot_disable = true;
> +               return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
> +int stack_depot_init(void)
>  {
>         static DEFINE_MUTEX(stack_depot_init_mutex);
> +       int ret = 0;
>
>         mutex_lock(&stack_depot_init_mutex);
>         if (!stack_depot_disable && !stack_table) {
> -               size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> -               int i;
> -
> -               if (slab_is_available()) {
> -                       pr_info("Stack Depot allocating hash table with kvmalloc\n");
> -                       stack_table = kvmalloc(size, GFP_KERNEL);
> -               } else {
> -                       pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> -                       stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> -               }
> -               if (stack_table) {
> -                       for (i = 0; i < STACK_HASH_SIZE;  i++)
> -                               stack_table[i] = NULL;
> -               } else {
> +               pr_info("Stack Depot allocating hash table with kvcalloc\n");
> +               stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
> +               if (!stack_table) {
>                         pr_err("Stack Depot hash table allocation failed, disabling\n");
>                         stack_depot_disable = true;
> -                       mutex_unlock(&stack_depot_init_mutex);
> -                       return -ENOMEM;
> +                       ret = -ENOMEM;
>                 }
>         }
>         mutex_unlock(&stack_depot_init_mutex);
> -       return 0;
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_init);
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 99e360df9465..4313f8212a83 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>
>  static int __init early_page_owner_param(char *buf)
>  {
> -       return kstrtobool(buf, &page_owner_enabled);
> +       int ret = kstrtobool(buf, &page_owner_enabled);
> +
> +       if (page_owner_enabled)
> +               stack_depot_want_early_init();
> +
> +       return ret;
>  }
>  early_param("page_owner", early_page_owner_param);
>
> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>         if (!page_owner_enabled)
>                 return;
>
> -       stack_depot_init();
> -
>         register_dummy_stack();
>         register_failure_stack();
>         register_early_stack();
> --
> 2.35.1
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3r0 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
  2022-03-04  9:18     ` Marco Elver
@ 2022-03-04 10:47     ` Hyeonggon Yoo
  2022-03-04 11:02     ` Mike Rapoport
  2 siblings, 0 replies; 19+ messages in thread
From: Hyeonggon Yoo @ 2022-03-04 10:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Mike Rapoport,
	Imran Khan

On Thu, Mar 03, 2022 at 08:19:33PM +0100, Vlastimil Babka wrote:
> Here's an updated version based on feedback so I don't re-send everything
> so soon after v2. Also in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v3r0
> There are only trivial adaptations in patch 3/6 to the stack depot init
> changes otherwise.
> Thanks, Vlastimil
> 
> ----8<----
> From 230ebae0f83540574d167f9ba1f71d3f602ca410 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 2 Mar 2022 12:02:22 +0100
> Subject: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization
>  dynamically
> 
> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
> 
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parameters (which happens early enough)
> can call a new function stack_depot_want_early_init(), which sets a flag
> that stack_depot_early_init() will check.
> 
> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
> 
> As suggested by Mike, make stack_depot_early_init() only attempt
> memblock allocation and stack_depot_init() only attempt kvmalloc().
> Also change the latter to kvcalloc(). In both cases we can lose the
> explicit array zeroing, which the allocations do already.
> 
> As suggested by Marco, provide empty implementations of the init
> functions for !CONFIG_STACKDEPOT builds to simplify the callers.
> 
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> 
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/stackdepot.h | 26 ++++++++++++---
>  lib/stackdepot.c           | 67 +++++++++++++++++++++++++-------------
>  mm/page_owner.c            |  9 +++--
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..bc2797955de9 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					gfp_t gfp_flags, bool can_alloc);
>  
>  /*
> - * Every user of stack depot has to call this during its own init when it's
> - * decided that it will be calling stack_depot_save() later.
> + * Every user of stack depot has to call stack_depot_init() during its own init
> + * when it's decided that it will be calling stack_depot_save() later. This is
> + * recommended for e.g. modules initialized later in the boot process, when
> + * slab_is_available() is true.
>   *
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to call stack_depot_want_early_init(), when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the enablement point in mm_init().
> + *
> + * stack_depot_init() and stack_depot_want_early_init() can be called regardless
> + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
> + * functions should only be called from code that makes sure CONFIG_STACKDEPOT
> + * is enabled.
>   */
> +#ifdef CONFIG_STACKDEPOT
>  int stack_depot_init(void);
>  
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> +void __init stack_depot_want_early_init(void);
> +
> +/* This is supposed to be called only from mm_init() */
> +int __init stack_depot_early_init(void);
>  #else
> +static inline int stack_depot_init(void) { return 0; }
> +
> +static inline void stack_depot_want_early_init(void) { }
> +
>  static inline int stack_depot_early_init(void)	{ return 0; }
>  #endif
>  
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..0a5916f1e7a3 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,9 @@ struct stack_record {
>  	unsigned long entries[];	/* Variable-sized array of entries. */
>  };
>  
> +static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
> +static bool __stack_depot_early_init_passed __initdata;
> +
>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>  
>  static int depot_index;
> @@ -162,38 +165,58 @@ static int __init is_stack_depot_disabled(char *str)
>  }
>  early_param("stack_depot_disable", is_stack_depot_disabled);
>  
> -/*
> - * __ref because of memblock_alloc(), which will not be actually called after
> - * the __init code is gone, because at that point slab_is_available() is true
> - */
> -__ref int stack_depot_init(void)
> +void __init stack_depot_want_early_init(void)
> +{
> +	/* Too late to request early init now */
> +	WARN_ON(__stack_depot_early_init_passed);
> +
> +	__stack_depot_want_early_init = true;
> +}
> +
> +int __init stack_depot_early_init(void)
> +{
> +	size_t size;
> +	int i;

compile warning: unused variable i.

> +	/* This is supposed to be called only once, from mm_init() */
> +	if (WARN_ON(__stacki_depot_early_init_passed))
> +		return 0;
> +
> +	__stack_depot_early_init_passed = true;
> +
> +	if (!__stack_depot_want_early_init || stack_depot_disable)
> +		return 0;
> +
> +	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
> +	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> +	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> +
> +	if (!stack_table) {
> +		pr_err("Stack Depot hash table allocation failed, disabling\n");
> +		stack_depot_disable = true;
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +int stack_depot_init(void)
>  {
>  	static DEFINE_MUTEX(stack_depot_init_mutex);
> +	int ret = 0;
>  
>  	mutex_lock(&stack_depot_init_mutex);
>  	if (!stack_depot_disable && !stack_table) {
> -		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> -		int i;
> -
> -		if (slab_is_available()) {
> -			pr_info("Stack Depot allocating hash table with kvmalloc\n");
> -			stack_table = kvmalloc(size, GFP_KERNEL);
> -		} else {
> -			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> -			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> -		}
> -		if (stack_table) {
> -			for (i = 0; i < STACK_HASH_SIZE;  i++)
> -				stack_table[i] = NULL;
> -		} else {
> +		pr_info("Stack Depot allocating hash table with kvcalloc\n");
> +		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
> +		if (!stack_table) {
>  			pr_err("Stack Depot hash table allocation failed, disabling\n");
>  			stack_depot_disable = true;
> -			mutex_unlock(&stack_depot_init_mutex);
> -			return -ENOMEM;
> +			ret = -ENOMEM;
>  		}
>  	}
>  	mutex_unlock(&stack_depot_init_mutex);
> -	return 0;
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_init);
>  
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 99e360df9465..4313f8212a83 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>  
>  static int __init early_page_owner_param(char *buf)
>  {
> -	return kstrtobool(buf, &page_owner_enabled);
> +	int ret = kstrtobool(buf, &page_owner_enabled);
> +
> +	if (page_owner_enabled)
> +		stack_depot_want_early_init();
> +
> +	return ret;
>  }
>  early_param("page_owner", early_page_owner_param);
>  
> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>  	if (!page_owner_enabled)
>  		return;
>  
> -	stack_depot_init();
> -
>  	register_dummy_stack();
>  	register_failure_stack();
>  	register_early_stack();
> -- 
> 2.35.1
> 
>

This approach looks much better!

Otherwise looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Works as expected on my system.
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

-- 
Thank you, You are awesome!
Hyeonggon :-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3r0 1/6] lib/stackdepot: allow requesting early initialization dynamically
  2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
  2022-03-04  9:18     ` Marco Elver
  2022-03-04 10:47     ` Hyeonggon Yoo
@ 2022-03-04 11:02     ` Mike Rapoport
  2 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2022-03-04 11:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Hyeonggon Yoo,
	Imran Khan

On Thu, Mar 03, 2022 at 08:19:33PM +0100, Vlastimil Babka wrote:
> Here's an updated version based on feedback so I don't re-send everything
> so soon after v2. Also in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v3r0
> There are only trivial adaptations in patch 3/6 to the stack depot init
> changes otherwise.
> Thanks, Vlastimil
> 
> ----8<----
> From 230ebae0f83540574d167f9ba1f71d3f602ca410 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Wed, 2 Mar 2022 12:02:22 +0100
> Subject: [PATCH v3 1/6] lib/stackdepot: allow requesting early initialization
>  dynamically
> 
> In a later patch we want to add stackdepot support for object owner
> tracking in slub caches, which is enabled by slub_debug boot parameter.
> This creates a bootstrap problem as some caches are created early in
> boot when slab_is_available() is false and thus stack_depot_init()
> tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are
> already beyond memblock_free_all(). Ideally memblock allocation should
> fail, yet it succeeds, but later the system crashes, which is a
> separately handled issue.
> 
> To resolve this boostrap issue in a robust way, this patch adds another
> way to request stack_depot_early_init(), which happens at a well-defined
> point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT,
> code that's e.g. processing boot parameters (which happens early enough)
> can call a new function stack_depot_want_early_init(), which sets a flag
> that stack_depot_early_init() will check.
> 
> In this patch we also convert page_owner to this approach. While it
> doesn't have the bootstrap issue as slub, it's also a functionality
> enabled by a boot param and can thus request stack_depot_early_init()
> with memblock allocation instead of later initialization with
> kvmalloc().
> 
> As suggested by Mike, make stack_depot_early_init() only attempt
> memblock allocation and stack_depot_init() only attempt kvmalloc().
> Also change the latter to kvcalloc(). In both cases we can lose the
> explicit array zeroing, which the allocations do already.
> 
> As suggested by Marco, provide empty implementations of the init
> functions for !CONFIG_STACKDEPOT builds to simplify the callers.
> 
> [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/
> 
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
> Suggested-by: Marco Elver <elver@google.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Thanks, Vlastimil!

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  include/linux/stackdepot.h | 26 ++++++++++++---
>  lib/stackdepot.c           | 67 +++++++++++++++++++++++++-------------
>  mm/page_owner.c            |  9 +++--
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index 17f992fe6355..bc2797955de9 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>  					gfp_t gfp_flags, bool can_alloc);
>  
>  /*
> - * Every user of stack depot has to call this during its own init when it's
> - * decided that it will be calling stack_depot_save() later.
> + * Every user of stack depot has to call stack_depot_init() during its own init
> + * when it's decided that it will be calling stack_depot_save() later. This is
> + * recommended for e.g. modules initialized later in the boot process, when
> + * slab_is_available() is true.
>   *
>   * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot
>   * enabled as part of mm_init(), for subsystems where it's known at compile time
>   * that stack depot will be used.
> + *
> + * Another alternative is to call stack_depot_want_early_init(), when the
> + * decision to use stack depot is taken e.g. when evaluating kernel boot
> + * parameters, which precedes the enablement point in mm_init().
> + *
> + * stack_depot_init() and stack_depot_want_early_init() can be called regardless
> + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print
> + * functions should only be called from code that makes sure CONFIG_STACKDEPOT
> + * is enabled.
>   */
> +#ifdef CONFIG_STACKDEPOT
>  int stack_depot_init(void);
>  
> -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT
> -static inline int stack_depot_early_init(void)	{ return stack_depot_init(); }
> +void __init stack_depot_want_early_init(void);
> +
> +/* This is supposed to be called only from mm_init() */
> +int __init stack_depot_early_init(void);
>  #else
> +static inline int stack_depot_init(void) { return 0; }
> +
> +static inline void stack_depot_want_early_init(void) { }
> +
>  static inline int stack_depot_early_init(void)	{ return 0; }
>  #endif
>  
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..0a5916f1e7a3 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -66,6 +66,9 @@ struct stack_record {
>  	unsigned long entries[];	/* Variable-sized array of entries. */
>  };
>  
> +static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
> +static bool __stack_depot_early_init_passed __initdata;
> +
>  static void *stack_slabs[STACK_ALLOC_MAX_SLABS];
>  
>  static int depot_index;
> @@ -162,38 +165,58 @@ static int __init is_stack_depot_disabled(char *str)
>  }
>  early_param("stack_depot_disable", is_stack_depot_disabled);
>  
> -/*
> - * __ref because of memblock_alloc(), which will not be actually called after
> - * the __init code is gone, because at that point slab_is_available() is true
> - */
> -__ref int stack_depot_init(void)
> +void __init stack_depot_want_early_init(void)
> +{
> +	/* Too late to request early init now */
> +	WARN_ON(__stack_depot_early_init_passed);
> +
> +	__stack_depot_want_early_init = true;
> +}
> +
> +int __init stack_depot_early_init(void)
> +{
> +	size_t size;
> +	int i;
> +
> +	/* This is supposed to be called only once, from mm_init() */
> +	if (WARN_ON(__stack_depot_early_init_passed))
> +		return 0;
> +
> +	__stack_depot_early_init_passed = true;
> +
> +	if (!__stack_depot_want_early_init || stack_depot_disable)
> +		return 0;
> +
> +	pr_info("Stack Depot early init allocating hash table with memblock_alloc\n");
> +	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> +	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> +
> +	if (!stack_table) {
> +		pr_err("Stack Depot hash table allocation failed, disabling\n");
> +		stack_depot_disable = true;
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +int stack_depot_init(void)
>  {
>  	static DEFINE_MUTEX(stack_depot_init_mutex);
> +	int ret = 0;
>  
>  	mutex_lock(&stack_depot_init_mutex);
>  	if (!stack_depot_disable && !stack_table) {
> -		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> -		int i;
> -
> -		if (slab_is_available()) {
> -			pr_info("Stack Depot allocating hash table with kvmalloc\n");
> -			stack_table = kvmalloc(size, GFP_KERNEL);
> -		} else {
> -			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> -			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> -		}
> -		if (stack_table) {
> -			for (i = 0; i < STACK_HASH_SIZE;  i++)
> -				stack_table[i] = NULL;
> -		} else {
> +		pr_info("Stack Depot allocating hash table with kvcalloc\n");
> +		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
> +		if (!stack_table) {
>  			pr_err("Stack Depot hash table allocation failed, disabling\n");
>  			stack_depot_disable = true;
> -			mutex_unlock(&stack_depot_init_mutex);
> -			return -ENOMEM;
> +			ret = -ENOMEM;
>  		}
>  	}
>  	mutex_unlock(&stack_depot_init_mutex);
> -	return 0;
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(stack_depot_init);
>  
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 99e360df9465..4313f8212a83 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -42,7 +42,12 @@ static void init_early_allocated_pages(void);
>  
>  static int __init early_page_owner_param(char *buf)
>  {
> -	return kstrtobool(buf, &page_owner_enabled);
> +	int ret = kstrtobool(buf, &page_owner_enabled);
> +
> +	if (page_owner_enabled)
> +		stack_depot_want_early_init();
> +
> +	return ret;
>  }
>  early_param("page_owner", early_page_owner_param);
>  
> @@ -80,8 +85,6 @@ static __init void init_page_owner(void)
>  	if (!page_owner_enabled)
>  		return;
>  
> -	stack_depot_init();
> -
>  	register_dummy_stack();
>  	register_failure_stack();
>  	register_early_stack();
> -- 
> 2.35.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-03-02 17:31 ` [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-03-04 11:25   ` Hyeonggon Yoo
  2022-03-04 12:10     ` Vlastimil Babka
  0 siblings, 1 reply; 19+ messages in thread
From: Hyeonggon Yoo @ 2022-03-04 11:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Mike Rapoport,
	Imran Khan

On Wed, Mar 02, 2022 at 06:31:19PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
> 
> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
> stackdepot to save stack trace.
> 
> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
> 
> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
> 
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  init/Kconfig     |  1 +
>  mm/slab_common.c |  5 ++++
>  mm/slub.c        | 71 +++++++++++++++++++++++++++---------------------
>  3 files changed, 46 insertions(+), 31 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..b21dd3a4a106 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1871,6 +1871,7 @@ config SLUB_DEBUG
>  	default y
>  	bool "Enable SLUB debugging support" if EXPERT
>  	depends on SLUB && SYSFS
> +	select STACKDEPOT if STACKTRACE_SUPPORT
>  	help
>  	  SLUB has extensive debug support features. Disabling these can
>  	  result in significant savings in code size. This also disables
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 23f2ab0713b7..e51d50d03000 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -24,6 +24,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/page.h>
>  #include <linux/memcontrol.h>
> +#include <linux/stackdepot.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/kmem.h>
> @@ -314,9 +315,13 @@ kmem_cache_create_usercopy(const char *name,
>  	 * If no slub_debug was enabled globally, the static key is not yet
>  	 * enabled by setup_slub_debug(). Enable it if the cache is being
>  	 * created with any of the debugging flags passed explicitly.
> +	 * It's also possible that this is the first cache created with
> +	 * SLAB_STORE_USER and we should init stack_depot for it.
>  	 */
>  	if (flags & SLAB_DEBUG_FLAGS)
>  		static_branch_enable(&slub_debug_enabled);
> +	if (flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> +		stack_depot_init();
>  #endif

Is this comment and code still valid in v3?

>  	mutex_lock(&slab_mutex);
> diff --git a/mm/slub.c b/mm/slub.c
> index 1fc451f4fe62..42cb79af70a0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -26,6 +26,7 @@
>  #include <linux/cpuset.h>
>  #include <linux/mempolicy.h>
>  #include <linux/ctype.h>
> +#include <linux/stackdepot.h>
>  #include <linux/debugobjects.h>
>  #include <linux/kallsyms.h>
>  #include <linux/kfence.h>
> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>  #define TRACK_ADDRS_COUNT 16
>  struct track {
>  	unsigned long addr;	/* Called from address */
> -#ifdef CONFIG_STACKTRACE
> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> +	depot_stack_handle_t handle;
>  #endif
>  	int cpu;		/* Was running on cpu */
>  	int pid;		/* Pid context */
> @@ -724,22 +725,19 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>  	return kasan_reset_tag(p + alloc);
>  }
>  
> -static void set_track(struct kmem_cache *s, void *object,
> +static void noinline set_track(struct kmem_cache *s, void *object,
>  			enum track_item alloc, unsigned long addr)
>  {

noinline for debugging purpose?
I think it's okay. just a question.

>  	struct track *p = get_track(s, object, alloc);
>  
> -#ifdef CONFIG_STACKTRACE
> +#ifdef CONFIG_STACKDEPOT
> +	unsigned long entries[TRACK_ADDRS_COUNT];
>  	unsigned int nr_entries;
>  
> -	metadata_access_enable();
> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> -				      TRACK_ADDRS_COUNT, 3);
> -	metadata_access_disable();
> -
> -	if (nr_entries < TRACK_ADDRS_COUNT)
> -		p->addrs[nr_entries] = 0;
> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
> +	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
>  #endif
> +
>  	p->addr = addr;
>  	p->cpu = smp_processor_id();
>  	p->pid = current->pid;
> @@ -759,20 +757,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>  
>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
>  {
> +	depot_stack_handle_t handle __maybe_unused;
> +
>  	if (!t->addr)
>  		return;
>  
>  	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>  	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
> -#ifdef CONFIG_STACKTRACE
> -	{
> -		int i;
> -		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
> -			if (t->addrs[i])
> -				pr_err("\t%pS\n", (void *)t->addrs[i]);
> -			else
> -				break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(t->handle);
> +	if (handle)
> +		stack_depot_print(handle);
> +	else
> +		pr_err("object allocation/free stack trace missing\n");
>  #endif
>  }
>  
> @@ -1532,6 +1529,8 @@ static int __init setup_slub_debug(char *str)
>  			global_slub_debug_changed = true;
>  		} else {
>  			slab_list_specified = true;
> +			if (flags & SLAB_STORE_USER)
> +				stack_depot_want_early_init = true;

This is updated to stack_depot_want_early_init() in v3.

>  		}
>  	}
>  
> @@ -1549,6 +1548,8 @@ static int __init setup_slub_debug(char *str)
>  	}
>  out:
>  	slub_debug = global_flags;
> +	if (slub_debug & SLAB_STORE_USER)
> +		stack_depot_want_early_init = true;

This too.

>  	if (slub_debug != 0 || slub_debug_string)
>  		static_branch_enable(&slub_debug_enabled);
>  	else
> @@ -4352,18 +4353,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  	objp = fixup_red_left(s, objp);
>  	trackp = get_track(s, objp, TRACK_ALLOC);
>  	kpp->kp_ret = (void *)trackp->addr;
> -#ifdef CONFIG_STACKTRACE
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_stack[i])
> -			break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	{
> +		depot_stack_handle_t handle;
> +		unsigned long *entries;
> +		unsigned int nr_entries;
> +
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_stack[i] = (void *)entries[i];
> +		}
>  
> -	trackp = get_track(s, objp, TRACK_FREE);
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_free_stack[i])
> -			break;
> +		trackp = get_track(s, objp, TRACK_FREE);
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_free_stack[i] = (void *)entries[i];
> +		}
>  	}
>  #endif
>  #endif
> -- 
> 2.35.1

Otherwise looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

kmem_dump_obj() and slab error report functionality works fine.
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

-- 
Thank you, You are awesome!
Hyeonggon :-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects
  2022-03-04 11:25   ` Hyeonggon Yoo
@ 2022-03-04 12:10     ` Vlastimil Babka
  0 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2022-03-04 12:10 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Marco Elver, Mike Rapoport,
	Imran Khan

On 3/4/22 12:25, Hyeonggon Yoo wrote:
> On Wed, Mar 02, 2022 at 06:31:19PM +0100, Vlastimil Babka wrote:
>> From: Oliver Glitta <glittao@gmail.com>
>> 
>> Many stack traces are similar so there are many similar arrays.
>> Stackdepot saves each unique stack only once.
>> 
>> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
>> stackdepot to save stack trace.
>> 
>> The benefits are smaller memory overhead and possibility to aggregate
>> per-cache statistics in the following patch using the stackdepot handle
>> instead of matching stacks manually.
>> 
>> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
>> 
>> This was initially merged as commit 788691464c29 and reverted by commit
>> ae14c63a9f20 due to several issues, that should now be fixed.
>> The problem of unconditional memory overhead by stackdepot has been
>> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
>> and stack_table allocation by kvmalloc()"), so the dependency on
>> stackdepot will result in extra memory usage only when a slab cache
>> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
>> The build failures on some architectures were also addressed, and the
>> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
>> patch.
>> 
>> Signed-off-by: Oliver Glitta <glittao@gmail.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Pekka Enberg <penberg@kernel.org>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>

...

>> @@ -314,9 +315,13 @@ kmem_cache_create_usercopy(const char *name,
>>  	 * If no slub_debug was enabled globally, the static key is not yet
>>  	 * enabled by setup_slub_debug(). Enable it if the cache is being
>>  	 * created with any of the debugging flags passed explicitly.
>> +	 * It's also possible that this is the first cache created with
>> +	 * SLAB_STORE_USER and we should init stack_depot for it.
>>  	 */
>>  	if (flags & SLAB_DEBUG_FLAGS)
>>  		static_branch_enable(&slub_debug_enabled);
>> +	if (flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
>> +		stack_depot_init();
>>  #endif
> 
> Is this comment and code still valid in v3?

The comment is still valid, as there can be a kmem_cache_create() call with
SLAB_STORE_USER (in fact there's one in kernel/rcu/rcutorture.c) that's not
covered by the slub_debug parsing.
The code in v3 is just without IS_ENABLED(CONFIG_STACKDEPOT).

>>  	mutex_lock(&slab_mutex);
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1fc451f4fe62..42cb79af70a0 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/cpuset.h>
>>  #include <linux/mempolicy.h>
>>  #include <linux/ctype.h>
>> +#include <linux/stackdepot.h>
>>  #include <linux/debugobjects.h>
>>  #include <linux/kallsyms.h>
>>  #include <linux/kfence.h>
>> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>>  #define TRACK_ADDRS_COUNT 16
>>  struct track {
>>  	unsigned long addr;	/* Called from address */
>> -#ifdef CONFIG_STACKTRACE
>> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
>> +#ifdef CONFIG_STACKDEPOT
>> +	depot_stack_handle_t handle;
>>  #endif
>>  	int cpu;		/* Was running on cpu */
>>  	int pid;		/* Pid context */
>> @@ -724,22 +725,19 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>>  	return kasan_reset_tag(p + alloc);
>>  }
>>  
>> -static void set_track(struct kmem_cache *s, void *object,
>> +static void noinline set_track(struct kmem_cache *s, void *object,
>>  			enum track_item alloc, unsigned long addr)
>>  {
> 
> noinline for debugging purpose?
> I think it's okay. just a question.

These noinlines make sure that the amount of stack entries are stable and
not subject to inline decisions of compiler...

>>  	struct track *p = get_track(s, object, alloc);
>>  
>> -#ifdef CONFIG_STACKTRACE
>> +#ifdef CONFIG_STACKDEPOT
>> +	unsigned long entries[TRACK_ADDRS_COUNT];
>>  	unsigned int nr_entries;
>>  
>> -	metadata_access_enable();
>> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
>> -				      TRACK_ADDRS_COUNT, 3);
>> -	metadata_access_disable();
>> -
>> -	if (nr_entries < TRACK_ADDRS_COUNT)
>> -		p->addrs[nr_entries] = 0;
>> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);

... so that here '3' removes the correct count of 'internal' stack trace
entries that are not interesting for us.

>> +	p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);
>>  #endif
>> +
>>  	p->addr = addr;
>>  	p->cpu = smp_processor_id();
>>  	p->pid = current->pid;
>> @@ -759,20 +757,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>>  
>>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
>>  {
>> +	depot_stack_handle_t handle __maybe_unused;
>> +
>>  	if (!t->addr)
>>  		return;
>>  
>>  	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>>  	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
>> -#ifdef CONFIG_STACKTRACE
>> -	{
>> -		int i;
>> -		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
>> -			if (t->addrs[i])
>> -				pr_err("\t%pS\n", (void *)t->addrs[i]);
>> -			else
>> -				break;
>> -	}
>> +#ifdef CONFIG_STACKDEPOT
>> +	handle = READ_ONCE(t->handle);
>> +	if (handle)
>> +		stack_depot_print(handle);
>> +	else
>> +		pr_err("object allocation/free stack trace missing\n");
>>  #endif
>>  }
>>  
>> @@ -1532,6 +1529,8 @@ static int __init setup_slub_debug(char *str)
>>  			global_slub_debug_changed = true;
>>  		} else {
>>  			slab_list_specified = true;
>> +			if (flags & SLAB_STORE_USER)
>> +				stack_depot_want_early_init = true;
> 
> This is updated to stack_depot_want_early_init() in v3.

Yes.

>>  		}
>>  	}
>>  
>> @@ -1549,6 +1548,8 @@ static int __init setup_slub_debug(char *str)
>>  	}
>>  out:
>>  	slub_debug = global_flags;
>> +	if (slub_debug & SLAB_STORE_USER)
>> +		stack_depot_want_early_init = true;
> 
> This too.

Yes.

>>  	if (slub_debug != 0 || slub_debug_string)
>>  		static_branch_enable(&slub_debug_enabled);
>>  	else
>> @@ -4352,18 +4353,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>>  	objp = fixup_red_left(s, objp);
>>  	trackp = get_track(s, objp, TRACK_ALLOC);
>>  	kpp->kp_ret = (void *)trackp->addr;
>> -#ifdef CONFIG_STACKTRACE
>> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> -		kpp->kp_stack[i] = (void *)trackp->addrs[i];
>> -		if (!kpp->kp_stack[i])
>> -			break;
>> -	}
>> +#ifdef CONFIG_STACKDEPOT
>> +	{
>> +		depot_stack_handle_t handle;
>> +		unsigned long *entries;
>> +		unsigned int nr_entries;
>> +
>> +		handle = READ_ONCE(trackp->handle);
>> +		if (handle) {
>> +			nr_entries = stack_depot_fetch(handle, &entries);
>> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> +				kpp->kp_stack[i] = (void *)entries[i];
>> +		}
>>  
>> -	trackp = get_track(s, objp, TRACK_FREE);
>> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> -		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
>> -		if (!kpp->kp_free_stack[i])
>> -			break;
>> +		trackp = get_track(s, objp, TRACK_FREE);
>> +		handle = READ_ONCE(trackp->handle);
>> +		if (handle) {
>> +			nr_entries = stack_depot_fetch(handle, &entries);
>> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> +				kpp->kp_free_stack[i] = (void *)entries[i];
>> +		}
>>  	}
>>  #endif
>>  #endif
>> -- 
>> 2.35.1
> 
> Otherwise looks good.
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> kmem_dump_obj() and slab error report functionality works fine.
> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Thanks!


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-03-04 12:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02 17:31 [PATCH v2 0/6] SLUB debugfs improvements based on stackdepot Vlastimil Babka
2022-03-02 17:31 ` [PATCH v2 1/6] lib/stackdepot: allow requesting early initialization dynamically Vlastimil Babka
2022-03-02 17:47   ` Marco Elver
2022-03-02 18:02     ` Vlastimil Babka
2022-03-02 18:15       ` Marco Elver
2022-03-02 18:01   ` Mike Rapoport
2022-03-03 19:19   ` [PATCH v3r0 " Vlastimil Babka
2022-03-04  9:18     ` Marco Elver
2022-03-04 10:47     ` Hyeonggon Yoo
2022-03-04 11:02     ` Mike Rapoport
2022-03-02 17:31 ` [PATCH v2 2/6] mm/slub: move struct track init out of set_track() Vlastimil Babka
2022-03-02 17:31 ` [PATCH v2 3/6] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
2022-03-04 11:25   ` Hyeonggon Yoo
2022-03-04 12:10     ` Vlastimil Babka
2022-03-02 17:31 ` [PATCH v2 4/6] mm/slub: distinguish and print stack traces in debugfs files Vlastimil Babka
2022-03-02 17:31 ` [PATCH v2 5/6] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
2022-03-02 17:31 ` [PATCH v2 6/6] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
2022-03-03  8:14   ` Hyeonggon Yoo
2022-03-03  9:33   ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).