linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] SLUB debugfs improvements based on stackdepot
@ 2022-02-25 18:03 Vlastimil Babka
  2022-02-25 18:03 ` [PATCH 1/5] mm/slub: move struct track init out of set_track() Vlastimil Babka
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka, Jonathan Corbet, Randy Dunlap

Hi,

this series combines and revives patches from Oliver's last year
bachelor thesis (where I was the advisor) that make SLUB's debugfs
files alloc_traces and free_traces more useful.
The resubmission was blocked on stackdepot changes that are now merged,
as explained in patch 2.

Patch 1 is a new preparatory cleanup.

Patch 2 originally submitted here [1], was merged to mainline but
reverted for stackdepot related issues as explained in the patch.

Patches 3-5 originally submitted as RFC here [2]. In this submission I
have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
be considered too intrusive so I will postpone it for later. The docs
patch is adjusted accordingly.

Also available in git, based on v5.17-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1

I'd like to ask for some review before I add this to the slab tree.

[1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
[2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/

Oliver Glitta (4):
  mm/slub: use stackdepot to save stack trace in objects
  mm/slub: aggregate and print stack traces in debugfs files
  mm/slub: sort debugfs output by frequency of stack traces
  slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (1):
  mm/slub: move struct track init out of set_track()

 Documentation/vm/slub.rst |  61 +++++++++++++++
 init/Kconfig              |   1 +
 mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
 3 files changed, 162 insertions(+), 52 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/5] mm/slub: move struct track init out of set_track()
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
@ 2022-02-25 18:03 ` Vlastimil Babka
  2022-02-26 10:41   ` Hyeonggon Yoo
  2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka

set_track() either zeroes out the struct track or fills it, depending on
the addr parameter. This is unnecessary as there's only one place that
calls it for the initialization - init_tracking(). We can simply do the
zeroing there, with a single memset() that covers both TRACK_ALLOC and
TRACK_FREE as they are adjacent.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43..1fc451f4fe62 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
 {
 	struct track *p = get_track(s, object, alloc);
 
-	if (addr) {
 #ifdef CONFIG_STACKTRACE
-		unsigned int nr_entries;
+	unsigned int nr_entries;
 
-		metadata_access_enable();
-		nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-					      TRACK_ADDRS_COUNT, 3);
-		metadata_access_disable();
+	metadata_access_enable();
+	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
+				      TRACK_ADDRS_COUNT, 3);
+	metadata_access_disable();
 
-		if (nr_entries < TRACK_ADDRS_COUNT)
-			p->addrs[nr_entries] = 0;
+	if (nr_entries < TRACK_ADDRS_COUNT)
+		p->addrs[nr_entries] = 0;
 #endif
-		p->addr = addr;
-		p->cpu = smp_processor_id();
-		p->pid = current->pid;
-		p->when = jiffies;
-	} else {
-		memset(p, 0, sizeof(struct track));
-	}
+	p->addr = addr;
+	p->cpu = smp_processor_id();
+	p->pid = current->pid;
+	p->when = jiffies;
 }
 
 static void init_tracking(struct kmem_cache *s, void *object)
 {
+	struct track *p;
+
 	if (!(s->flags & SLAB_STORE_USER))
 		return;
 
-	set_track(s, object, TRACK_FREE, 0UL);
-	set_track(s, object, TRACK_ALLOC, 0UL);
+	p = get_track(s, object, TRACK_ALLOC);
+	memset(p, 0, 2*sizeof(struct track));
 }
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-02-25 18:03 ` [PATCH 1/5] mm/slub: move struct track init out of set_track() Vlastimil Babka
@ 2022-02-25 18:03 ` Vlastimil Babka
  2022-02-26 10:24   ` Hyeonggon Yoo
                     ` (2 more replies)
  2022-02-25 18:03 ` [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files Vlastimil Babka
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Many stack traces are similar so there are many similar arrays.
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle.  Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the following patch using the stackdepot handle
instead of matching stacks manually.

[ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]

This was initially merged as commit 788691464c29 and reverted by commit
ae14c63a9f20 due to several issues, that should now be fixed.
The problem of unconditional memory overhead by stackdepot has been
addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
and stack_table allocation by kvmalloc()"), so the dependency on
stackdepot will result in extra memory usage only when a slab cache
tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
The build failures on some architectures were also addressed, and the
reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
patch.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 init/Kconfig |  1 +
 mm/slub.c    | 88 +++++++++++++++++++++++++++++-----------------------
 2 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..b21dd3a4a106 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1871,6 +1871,7 @@ config SLUB_DEBUG
 	default y
 	bool "Enable SLUB debugging support" if EXPERT
 	depends on SLUB && SYSFS
+	select STACKDEPOT if STACKTRACE_SUPPORT
 	help
 	  SLUB has extensive debug support features. Disabling these can
 	  result in significant savings in code size. This also disables
diff --git a/mm/slub.c b/mm/slub.c
index 1fc451f4fe62..3140f763e819 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -26,6 +26,7 @@
 #include <linux/cpuset.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
+#include <linux/stackdepot.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/kfence.h>
@@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
 #define TRACK_ADDRS_COUNT 16
 struct track {
 	unsigned long addr;	/* Called from address */
-#ifdef CONFIG_STACKTRACE
-	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
+#ifdef CONFIG_STACKDEPOT
+	depot_stack_handle_t handle;
 #endif
 	int cpu;		/* Was running on cpu */
 	int pid;		/* Pid context */
@@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
 	return kasan_reset_tag(p + alloc);
 }
 
-static void set_track(struct kmem_cache *s, void *object,
-			enum track_item alloc, unsigned long addr)
+static noinline void
+set_track(struct kmem_cache *s, void *object, enum track_item alloc,
+	  unsigned long addr, gfp_t flags)
 {
 	struct track *p = get_track(s, object, alloc);
 
-#ifdef CONFIG_STACKTRACE
+#ifdef CONFIG_STACKDEPOT
+	unsigned long entries[TRACK_ADDRS_COUNT];
 	unsigned int nr_entries;
 
-	metadata_access_enable();
-	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-				      TRACK_ADDRS_COUNT, 3);
-	metadata_access_disable();
-
-	if (nr_entries < TRACK_ADDRS_COUNT)
-		p->addrs[nr_entries] = 0;
+	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
+	p->handle = stack_depot_save(entries, nr_entries, flags);
 #endif
+
 	p->addr = addr;
 	p->cpu = smp_processor_id();
 	p->pid = current->pid;
@@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
 
 static void print_track(const char *s, struct track *t, unsigned long pr_time)
 {
+	depot_stack_handle_t handle __maybe_unused;
+
 	if (!t->addr)
 		return;
 
 	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
 	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
-#ifdef CONFIG_STACKTRACE
-	{
-		int i;
-		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
-			if (t->addrs[i])
-				pr_err("\t%pS\n", (void *)t->addrs[i]);
-			else
-				break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(t->handle);
+	if (handle)
+		stack_depot_print(handle);
+	else
+		pr_err("object allocation/free stack trace missing\n");
 #endif
 }
 
@@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
 	return 1;
 }
 
-static noinline int alloc_debug_processing(struct kmem_cache *s,
-					struct slab *slab,
-					void *object, unsigned long addr)
+static noinline int
+alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
+		       unsigned long addr, gfp_t flags)
 {
 	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
 		if (!alloc_consistency_checks(s, slab, object))
@@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
 
 	/* Success perform special debug activities for allocs */
 	if (s->flags & SLAB_STORE_USER)
-		set_track(s, object, TRACK_ALLOC, addr);
+		set_track(s, object, TRACK_ALLOC, addr, flags);
 	trace(s, slab, object, 1);
 	init_object(s, object, SLUB_RED_ACTIVE);
 	return 1;
@@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
 	}
 
 	if (s->flags & SLAB_STORE_USER)
-		set_track(s, object, TRACK_FREE, addr);
+		set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
 	trace(s, slab, object, 0);
 	/* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
 	init_object(s, object, SLUB_RED_INACTIVE);
@@ -1632,7 +1630,8 @@ static inline
 void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
 
 static inline int alloc_debug_processing(struct kmem_cache *s,
-	struct slab *slab, void *object, unsigned long addr) { return 0; }
+	struct slab *slab, void *object, unsigned long addr,
+	gfp_t flags) { return 0; }
 
 static inline int free_debug_processing(
 	struct kmem_cache *s, struct slab *slab,
@@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 check_new_slab:
 
 	if (kmem_cache_debug(s)) {
-		if (!alloc_debug_processing(s, slab, freelist, addr)) {
+		if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
 			/* Slab failed checks. Next slab needed */
 			goto new_slab;
 		} else {
@@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->remote_node_defrag_ratio = 1000;
 #endif
 
+	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+		stack_depot_init();
+
 	/* Initialize the pre-computed randomized freelist if slab is up */
 	if (slab_state >= UP) {
 		if (init_cache_random_seq(s))
@@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 	objp = fixup_red_left(s, objp);
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	kpp->kp_ret = (void *)trackp->addr;
-#ifdef CONFIG_STACKTRACE
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_stack[i])
-			break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	{
+		depot_stack_handle_t handle;
+		unsigned long *entries;
+		unsigned int nr_entries;
+
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_stack[i] = (void *)entries[i];
+		}
 
-	trackp = get_track(s, objp, TRACK_FREE);
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_free_stack[i])
-			break;
+		trackp = get_track(s, objp, TRACK_FREE);
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_free_stack[i] = (void *)entries[i];
+		}
 	}
 #endif
 #endif
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
  2022-02-25 18:03 ` [PATCH 1/5] mm/slub: move struct track init out of set_track() Vlastimil Babka
  2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-02-25 18:03 ` Vlastimil Babka
  2022-02-27  0:18   ` Hyeonggon Yoo
  2022-02-27  0:22   ` Hyeonggon Yoo
  2022-02-25 18:03 ` [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Aggregate objects in slub cache by stack trace in addition to caller
address when producing contents of debugfs files alloc_traces and
free_traces in debugfs. Also add the stack traces to the debugfs
output. This makes it much more useful to e.g. debug memory leaks.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 3140f763e819..06599db4faa3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5075,6 +5075,7 @@ EXPORT_SYMBOL(validate_slab_cache);
  */
 
 struct location {
+	depot_stack_handle_t handle;
 	unsigned long count;
 	unsigned long addr;
 	long long sum_time;
@@ -5127,9 +5128,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 {
 	long start, end, pos;
 	struct location *l;
-	unsigned long caddr;
+	unsigned long caddr, chandle;
 	unsigned long age = jiffies - track->when;
+	depot_stack_handle_t handle = 0;
 
+#ifdef CONFIG_STACKDEPOT
+	handle = READ_ONCE(track->handle);
+#endif
 	start = -1;
 	end = t->count;
 
@@ -5144,7 +5149,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 			break;
 
 		caddr = t->loc[pos].addr;
-		if (track->addr == caddr) {
+		chandle = t->loc[pos].handle;
+		if ((track->addr == caddr) && (handle == chandle)) {
 
 			l = &t->loc[pos];
 			l->count++;
@@ -5169,6 +5175,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 
 		if (track->addr < caddr)
 			end = pos;
+		else if (track->addr == caddr && handle < chandle)
+			end = pos;
 		else
 			start = pos;
 	}
@@ -5191,6 +5199,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
 	l->max_time = age;
 	l->min_pid = track->pid;
 	l->max_pid = track->pid;
+	l->handle = handle;
 	cpumask_clear(to_cpumask(l->cpus));
 	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
 	nodes_clear(l->nodes);
@@ -6102,6 +6111,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
 			seq_printf(seq, " nodes=%*pbl",
 				 nodemask_pr_args(&l->nodes));
 
+#ifdef CONFIG_STACKDEPOT
+		{
+			depot_stack_handle_t handle;
+			unsigned long *entries;
+			unsigned int nr_entries, j;
+
+			handle = READ_ONCE(l->handle);
+			if (handle) {
+				nr_entries = stack_depot_fetch(handle, &entries);
+				seq_puts(seq, "\n");
+				for (j = 0; j < nr_entries; j++)
+					seq_printf(seq, "        %pS\n", (void *)entries[j]);
+			}
+		}
+#endif
 		seq_puts(seq, "\n");
 	}
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (2 preceding siblings ...)
  2022-02-25 18:03 ` [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files Vlastimil Babka
@ 2022-02-25 18:03 ` Vlastimil Babka
  2022-02-26 11:03   ` Hyeonggon Yoo
  2022-02-25 18:03 ` [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka

From: Oliver Glitta <glittao@gmail.com>

Sort the output of debugfs alloc_traces and free_traces by the frequency
of allocation/freeing stack traces. Most frequently used stack traces
will be printed first, e.g. for easier memory leak debugging.

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 06599db4faa3..a74afe59a403 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -38,6 +38,7 @@
 #include <linux/memcontrol.h>
 #include <linux/random.h>
 #include <kunit/test.h>
+#include <linux/sort.h>
 
 #include <linux/debugfs.h>
 #include <trace/events/kmem.h>
@@ -6150,6 +6151,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
 	return NULL;
 }
 
+static int cmp_loc_by_count(const void *a, const void *b, const void *data)
+{
+	struct location *loc1 = (struct location *)a;
+	struct location *loc2 = (struct location *)b;
+
+	if (loc1->count > loc2->count)
+		return -1;
+	else
+		return 1;
+}
+
 static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
 {
 	struct loc_track *t = seq->private;
@@ -6211,6 +6223,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
+	/* Sort locations by count */
+	sort_r(t->loc, t->count, sizeof(struct location),
+		cmp_loc_by_count, NULL, NULL);
+
 	bitmap_free(obj_map);
 	return 0;
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (3 preceding siblings ...)
  2022-02-25 18:03 ` [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
@ 2022-02-25 18:03 ` Vlastimil Babka
  2022-02-27  3:49   ` Hyeonggon Yoo
  2022-02-26  7:19 ` [PATCH 0/5] SLUB debugfs improvements based on stackdepot Hyeonggon Yoo
  2022-02-26 12:18 ` Hyeonggon Yoo
  6 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-25 18:03 UTC (permalink / raw)
  To: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin
  Cc: Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Vlastimil Babka, Jonathan Corbet, Randy Dunlap,
	linux-doc

From: Oliver Glitta <glittao@gmail.com>

Add description of debugfs files alloc_traces and free_traces
to SLUB cache documentation.

[ vbabka@suse.cz: some rewording ]

Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
---
 Documentation/vm/slub.rst | 61 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index d3028554b1e9..2b2b931e59fc 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -384,5 +384,66 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
       40,60`` range will plot only samples collected between 40th and
       60th seconds).
 
+
+DebugFS files for SLUB
+======================
+
+For more information about current state of SLUB caches with the user tracking
+debug option enabled, debugfs files are available, typically under
+/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
+tracking). There are 2 types of these files with the following debug
+information:
+
+1. alloc_traces::
+
+    Prints information about unique allocation traces of the currently
+    allocated objects. The output is sorted by frequency of each trace.
+
+    Information in the output:
+    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
+    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
+
+    Example:::
+
+    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
+	__slab_alloc+0x6d/0x90
+	kmem_cache_alloc_trace+0x2eb/0x300
+	populate_error_injection_list+0x97/0x110
+	init_error_injection+0x1b/0x71
+	do_one_initcall+0x5f/0x2d0
+	kernel_init_freeable+0x26f/0x2d7
+	kernel_init+0xe/0x118
+	ret_from_fork+0x22/0x30
+
+
+2. free_traces::
+
+    Prints information about unique free traces of the currently free objects,
+    sorted by their frequency.
+
+    Information in the output:
+    Number of objects, freeing function, minimal/average/maximal jiffies since free,
+    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
+
+    Example:::
+
+    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
+	kfree+0x2db/0x420
+	acpi_ut_update_ref_count+0x6a6/0x782
+	acpi_ut_update_object_reference+0x1ad/0x234
+	acpi_ut_remove_reference+0x7d/0x84
+	acpi_rs_get_prt_method_data+0x97/0xd6
+	acpi_get_irq_routing_table+0x82/0xc4
+	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
+	acpi_pci_irq_lookup+0x3a/0x1e0
+	acpi_pci_irq_enable+0x77/0x240
+	pcibios_enable_device+0x39/0x40
+	do_pci_enable_device.part.0+0x5d/0xe0
+	pci_enable_device_flags+0xfc/0x120
+	pci_enable_device+0x13/0x20
+	virtio_pci_probe+0x9e/0x170
+	local_pci_probe+0x48/0x80
+	pci_device_probe+0x105/0x1c0
+
 Christoph Lameter, May 30, 2007
 Sergey Senozhatsky, October 23, 2015
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (4 preceding siblings ...)
  2022-02-25 18:03 ` [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
@ 2022-02-26  7:19 ` Hyeonggon Yoo
  2022-02-28 19:10   ` Vlastimil Babka
  2022-02-26 12:18 ` Hyeonggon Yoo
  6 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-26  7:19 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap

On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
> 

Hello. I just started review/testing this series.

it crashed on my system (arm64)

I ran with boot parameter slub_debug=U, and without KASAN.
So CONFIG_STACKDEPOT_ALWAYS_INIT=n.

void * __init memblock_alloc_try_nid(
                        phys_addr_t size, phys_addr_t align,
                        phys_addr_t min_addr, phys_addr_t max_addr,
                        int nid)
{
        void *ptr;

        memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
                     __func__, (u64)size, (u64)align, nid, &min_addr,
                     &max_addr, (void *)_RET_IP_);
        ptr = memblock_alloc_internal(size, align,
                                           min_addr, max_addr, nid, false);
        if (ptr)
                memset(ptr, 0, size); <--- Crash Here

        return ptr;
}

It crashed during create_boot_cache() -> stack_depot_init() ->
memblock_alloc().

I think That's because, in kmem_cache_init(), both slab and memblock is not
available. (AFAIU memblock is not available after mem_init() because of
memblock_free_all(), right?)

Thanks!

/*
 * Set up kernel memory allocators
 */
static void __init mm_init(void)
{
        /*
         * page_ext requires contiguous pages,
         * bigger than MAX_ORDER unless SPARSEMEM.
         */
        page_ext_init_flatmem();
        init_mem_debugging_and_hardening();
        kfence_alloc_pool();
        report_meminit();
        stack_depot_early_init();
        mem_init();
        mem_init_print_info();
        kmem_cache_init();
        /*
         * page_owner must be initialized after buddy is ready, and also after
         * slab is ready so that stack_depot_init() works properly
         */)

> Patch 1 is a new preparatory cleanup.
> 
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
> 
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
> 
> Also available in git, based on v5.17-rc1:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> 
> I'd like to ask for some review before I add this to the slab tree.
> 
> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> 
> Oliver Glitta (4):
>   mm/slub: use stackdepot to save stack trace in objects
>   mm/slub: aggregate and print stack traces in debugfs files
>   mm/slub: sort debugfs output by frequency of stack traces
>   slab, documentation: add description of debugfs files for SLUB caches
> 
> Vlastimil Babka (1):
>   mm/slub: move struct track init out of set_track()
> 
>  Documentation/vm/slub.rst |  61 +++++++++++++++
>  init/Kconfig              |   1 +
>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>  3 files changed, 162 insertions(+), 52 deletions(-)
> 
> -- 
> 2.35.1
> 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
@ 2022-02-26 10:24   ` Hyeonggon Yoo
  2022-02-28 18:44     ` Vlastimil Babka
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
  2022-02-27  9:44   ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Hyeonggon Yoo
  2 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-26 10:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
> 
> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
> stackdepot to save stack trace.
> 
> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
> 
> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
> 
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  init/Kconfig |  1 +
>  mm/slub.c    | 88 +++++++++++++++++++++++++++++-----------------------
>  2 files changed, 50 insertions(+), 39 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..b21dd3a4a106 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1871,6 +1871,7 @@ config SLUB_DEBUG
>  	default y
>  	bool "Enable SLUB debugging support" if EXPERT
>  	depends on SLUB && SYSFS
> +	select STACKDEPOT if STACKTRACE_SUPPORT
>  	help
>  	  SLUB has extensive debug support features. Disabling these can
>  	  result in significant savings in code size. This also disables
> diff --git a/mm/slub.c b/mm/slub.c
> index 1fc451f4fe62..3140f763e819 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -26,6 +26,7 @@
>  #include <linux/cpuset.h>
>  #include <linux/mempolicy.h>
>  #include <linux/ctype.h>
> +#include <linux/stackdepot.h>
>  #include <linux/debugobjects.h>
>  #include <linux/kallsyms.h>
>  #include <linux/kfence.h>
> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>  #define TRACK_ADDRS_COUNT 16
>  struct track {
>  	unsigned long addr;	/* Called from address */
> -#ifdef CONFIG_STACKTRACE
> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> +	depot_stack_handle_t handle;
>  #endif
>  	int cpu;		/* Was running on cpu */
>  	int pid;		/* Pid context */
> @@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>  	return kasan_reset_tag(p + alloc);
>  }
>  
> -static void set_track(struct kmem_cache *s, void *object,
> -			enum track_item alloc, unsigned long addr)
> +static noinline void
> +set_track(struct kmem_cache *s, void *object, enum track_item alloc,
> +	  unsigned long addr, gfp_t flags)
>  {
>  	struct track *p = get_track(s, object, alloc);
>  
> -#ifdef CONFIG_STACKTRACE
> +#ifdef CONFIG_STACKDEPOT
> +	unsigned long entries[TRACK_ADDRS_COUNT];
>  	unsigned int nr_entries;
>  
> -	metadata_access_enable();
> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> -				      TRACK_ADDRS_COUNT, 3);
> -	metadata_access_disable();
> -
> -	if (nr_entries < TRACK_ADDRS_COUNT)
> -		p->addrs[nr_entries] = 0;
> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
> +	p->handle = stack_depot_save(entries, nr_entries, flags);
>  #endif
> +
>  	p->addr = addr;
>  	p->cpu = smp_processor_id();
>  	p->pid = current->pid;
> @@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>  
>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
>  {
> +	depot_stack_handle_t handle __maybe_unused;
> +
>  	if (!t->addr)
>  		return;
>  
>  	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>  	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
> -#ifdef CONFIG_STACKTRACE
> -	{
> -		int i;
> -		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
> -			if (t->addrs[i])
> -				pr_err("\t%pS\n", (void *)t->addrs[i]);
> -			else
> -				break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(t->handle);
> +	if (handle)
> +		stack_depot_print(handle);
> +	else
> +		pr_err("object allocation/free stack trace missing\n");
>  #endif
>  }
>  
> @@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
>  	return 1;
>  }
>  
> -static noinline int alloc_debug_processing(struct kmem_cache *s,
> -					struct slab *slab,
> -					void *object, unsigned long addr)
> +static noinline int
> +alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
> +		       unsigned long addr, gfp_t flags)
>  {
>  	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
>  		if (!alloc_consistency_checks(s, slab, object))
> @@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
>  
>  	/* Success perform special debug activities for allocs */
>  	if (s->flags & SLAB_STORE_USER)
> -		set_track(s, object, TRACK_ALLOC, addr);
> +		set_track(s, object, TRACK_ALLOC, addr, flags);

I see warning because of this.
We should not reuse flags here because alloc_debug_processing() can be
called with preemption disabled, and caller specified GFP_KERNEL.

[    2.015902] BUG: sleeping function called from invalid context at mm/page_alloc.c:5164
[    2.022052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[    2.028357] preempt_count: 1, expected: 0
[    2.031508] RCU nest depth: 0, expected: 0
[    2.034722] 1 lock held by swapper/0/1:
[    2.037905]  #0: ffff00000488f4d0 (&sb->s_type->i_mutex_key#5){+.+.}-{4:4}, at: start_creating+0x58/0x130
[    2.045393] Preemption disabled at:
[    2.045400] [<ffff8000083bd008>] __slab_alloc.constprop.0+0x38/0xc0
[    2.053039] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.17.0-rc5+ #105
[    2.059365] Hardware name: linux,dummy-virt (DT)
[    2.063160] Call trace:
[    2.065217]  dump_backtrace+0xf8/0x130
[    2.068350]  show_stack+0x24/0x80
[    2.071104]  dump_stack_lvl+0x9c/0xd8
[    2.074140]  dump_stack+0x18/0x34
[    2.076894]  __might_resched+0x1a0/0x280
[    2.080146]  __might_sleep+0x58/0x90
[    2.083108]  prepare_alloc_pages.constprop.0+0x1b4/0x1f0
[    2.087468]  __alloc_pages+0x88/0x1e0
[    2.090502]  alloc_page_interleave+0x24/0xb4
[    2.094021]  alloc_pages+0x10c/0x170
[    2.096984]  __stack_depot_save+0x3e0/0x4e0
[    2.100446]  stack_depot_save+0x14/0x20
[    2.103617]  set_track.isra.0+0x64/0xa4
[    2.106787]  alloc_debug_processing+0x11c/0x1e0
[    2.110532]  ___slab_alloc+0x3e8/0x750
[    2.113643]  __slab_alloc.constprop.0+0x64/0xc0
[    2.117391]  kmem_cache_alloc+0x304/0x350
[    2.120702]  security_inode_alloc+0x38/0xa4
[    2.124169]  inode_init_always+0xd0/0x264
[    2.127501]  alloc_inode+0x44/0xec
[    2.130325]  new_inode+0x28/0xc0
[    2.133011]  tracefs_create_file+0x74/0x1e0
[    2.136459]  init_tracer_tracefs+0x248/0x644
[    2.140030]  tracer_init_tracefs+0x9c/0x34c
[    2.143483]  do_one_initcall+0x44/0x170
[    2.146654]  do_initcalls+0x104/0x144
[    2.149704]  kernel_init_freeable+0x130/0x178

[...]

>  	trace(s, slab, object, 1);
>  	init_object(s, object, SLUB_RED_ACTIVE);
>  	return 1;
> @@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
>  	}
>  
>  	if (s->flags & SLAB_STORE_USER)
> -		set_track(s, object, TRACK_FREE, addr);
> +		set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
>  	trace(s, slab, object, 0);
>  	/* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
>  	init_object(s, object, SLUB_RED_INACTIVE);
> @@ -1632,7 +1630,8 @@ static inline
>  void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>  
>  static inline int alloc_debug_processing(struct kmem_cache *s,
> -	struct slab *slab, void *object, unsigned long addr) { return 0; }
> +	struct slab *slab, void *object, unsigned long addr,
> +	gfp_t flags) { return 0; }
>  
>  static inline int free_debug_processing(
>  	struct kmem_cache *s, struct slab *slab,
> @@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  check_new_slab:
>  
>  	if (kmem_cache_debug(s)) {
> -		if (!alloc_debug_processing(s, slab, freelist, addr)) {
> +		if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
>  			/* Slab failed checks. Next slab needed */
>  			goto new_slab;
>  		} else {
> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>  	s->remote_node_defrag_ratio = 1000;
>  #endif
>  
> +	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> +		stack_depot_init();
> +

As mentioned in my report, it can crash system when creating boot caches
with debugging enabled.

The rest looks fine!

>  	/* Initialize the pre-computed randomized freelist if slab is up */
>  	if (slab_state >= UP) {
>  		if (init_cache_random_seq(s))
> @@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  	objp = fixup_red_left(s, objp);
>  	trackp = get_track(s, objp, TRACK_ALLOC);
>  	kpp->kp_ret = (void *)trackp->addr;
> -#ifdef CONFIG_STACKTRACE
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_stack[i])
> -			break;
> -	}
> +#ifdef CONFIG_STACKDEPOT
> +	{
> +		depot_stack_handle_t handle;
> +		unsigned long *entries;
> +		unsigned int nr_entries;
> +
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_stack[i] = (void *)entries[i];
> +		}
>  
> -	trackp = get_track(s, objp, TRACK_FREE);
> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
> -		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
> -		if (!kpp->kp_free_stack[i])
> -			break;
> +		trackp = get_track(s, objp, TRACK_FREE);
> +		handle = READ_ONCE(trackp->handle);
> +		if (handle) {
> +			nr_entries = stack_depot_fetch(handle, &entries);
> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
> +				kpp->kp_free_stack[i] = (void *)entries[i];
> +		}
>  	}
>  #endif
>  #endif
> -- 
> 2.35.1
> 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/5] mm/slub: move struct track init out of set_track()
  2022-02-25 18:03 ` [PATCH 1/5] mm/slub: move struct track init out of set_track() Vlastimil Babka
@ 2022-02-26 10:41   ` Hyeonggon Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-26 10:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:14PM +0100, Vlastimil Babka wrote:
> set_track() either zeroes out the struct track or fills it, depending on
> the addr parameter. This is unnecessary as there's only one place that
> calls it for the initialization - init_tracking(). We can simply do the
> zeroing there, with a single memset() that covers both TRACK_ALLOC and
> TRACK_FREE as they are adjacent.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 32 +++++++++++++++-----------------
>  1 file changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 261474092e43..1fc451f4fe62 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -729,34 +729,32 @@ static void set_track(struct kmem_cache *s, void *object,
>  {
>  	struct track *p = get_track(s, object, alloc);
>  
> -	if (addr) {
>  #ifdef CONFIG_STACKTRACE
> -		unsigned int nr_entries;
> +	unsigned int nr_entries;
>  
> -		metadata_access_enable();
> -		nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> -					      TRACK_ADDRS_COUNT, 3);
> -		metadata_access_disable();
> +	metadata_access_enable();
> +	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
> +				      TRACK_ADDRS_COUNT, 3);
> +	metadata_access_disable();
>  
> -		if (nr_entries < TRACK_ADDRS_COUNT)
> -			p->addrs[nr_entries] = 0;
> +	if (nr_entries < TRACK_ADDRS_COUNT)
> +		p->addrs[nr_entries] = 0;
>  #endif
> -		p->addr = addr;
> -		p->cpu = smp_processor_id();
> -		p->pid = current->pid;
> -		p->when = jiffies;
> -	} else {
> -		memset(p, 0, sizeof(struct track));
> -	}
> +	p->addr = addr;
> +	p->cpu = smp_processor_id();
> +	p->pid = current->pid;
> +	p->when = jiffies;
>  }
>  
>  static void init_tracking(struct kmem_cache *s, void *object)
>  {
> +	struct track *p;
> +
>  	if (!(s->flags & SLAB_STORE_USER))
>  		return;
>  
> -	set_track(s, object, TRACK_FREE, 0UL);
> -	set_track(s, object, TRACK_ALLOC, 0UL);
> +	p = get_track(s, object, TRACK_ALLOC);
> +	memset(p, 0, 2*sizeof(struct track));
>  }
>

Looks good.
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

And works nicely.
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
> -- 
> 2.35.1
> 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces
  2022-02-25 18:03 ` [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
@ 2022-02-26 11:03   ` Hyeonggon Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-26 11:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:17PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Sort the output of debugfs alloc_traces and free_traces by the frequency
> of allocation/freeing stack traces. Most frequently used stack traces
> will be printed first, e.g. for easier memory leak debugging.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 06599db4faa3..a74afe59a403 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -38,6 +38,7 @@
>  #include <linux/memcontrol.h>
>  #include <linux/random.h>
>  #include <kunit/test.h>
> +#include <linux/sort.h>
>  
>  #include <linux/debugfs.h>
>  #include <trace/events/kmem.h>
> @@ -6150,6 +6151,17 @@ static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
>  	return NULL;
>  }
>  
> +static int cmp_loc_by_count(const void *a, const void *b, const void *data)
> +{
> +	struct location *loc1 = (struct location *)a;
> +	struct location *loc2 = (struct location *)b;
> +
> +	if (loc1->count > loc2->count)
> +		return -1;
> +	else
> +		return 1;
> +}
> +
>  static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos)
>  {
>  	struct loc_track *t = seq->private;
> @@ -6211,6 +6223,10 @@ static int slab_debug_trace_open(struct inode *inode, struct file *filep)
>  		spin_unlock_irqrestore(&n->list_lock, flags);
>  	}
>  
> +	/* Sort locations by count */
> +	sort_r(t->loc, t->count, sizeof(struct location),
> +		cmp_loc_by_count, NULL, NULL);
> +
>  	bitmap_free(obj_map);
>  	return 0;
>  }

This is so cool!

Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

> -- 
> 2.35.1
> 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
                   ` (5 preceding siblings ...)
  2022-02-26  7:19 ` [PATCH 0/5] SLUB debugfs improvements based on stackdepot Hyeonggon Yoo
@ 2022-02-26 12:18 ` Hyeonggon Yoo
  2022-03-04 17:25   ` Vlastimil Babka
  6 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-26 12:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap

On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> Hi,
> 
> this series combines and revives patches from Oliver's last year
> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> files alloc_traces and free_traces more useful.
> The resubmission was blocked on stackdepot changes that are now merged,
> as explained in patch 2.
> 
> Patch 1 is a new preparatory cleanup.
> 
> Patch 2 originally submitted here [1], was merged to mainline but
> reverted for stackdepot related issues as explained in the patch.
> 
> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> be considered too intrusive so I will postpone it for later. The docs
> patch is adjusted accordingly.
> 

This problem is not caused by this patch series.
But I think it's worth mentioning...

It's really weird that some stack traces are not recorded
when CONFIG_KASAN=y.

I made sure that:
	- Stack Depot did not reach its limit
	- the free path happen on CONFIG_KASAN=y too.

I have no clue why this happen.

# cat dentry/free_traces (CONFIG_KASAN=y)
   6585 <not-available> age=4294912647 pid=0 cpus=0

# cat dentry/free_traces (CONFIG_KASAN=n)
   1246 <not-available> age=4294906877 pid=0 cpus=0
    379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
        kmem_cache_free+0x1f4/0x21c
        __d_free+0x20/0x2c
        rcu_core+0x334/0x580
        rcu_core_si+0x14/0x20
        __do_softirq+0x12c/0x2a8

      2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
        kmem_cache_free+0x1f4/0x21c
        dentry_free+0x58/0xb0
        __dentry_kill+0x18c/0x1d0
        dput+0x1c4/0x2fc
        __fput+0xb0/0x230
        ____fput+0x14/0x20
        task_work_run+0x84/0x17c
        do_notify_resume+0x208/0x1330
        el0_svc+0x6c/0x80
        el0t_64_sync_handler+0xa8/0x130
        el0t_64_sync+0x1a0/0x1a4

      1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
        kmem_cache_free+0x1f4/0x21c
        dentry_free+0x58/0xb0
        __dentry_kill+0x18c/0x1d0
        dput+0x1c4/0x2fc
        __fput+0xb0/0x230
        ____fput+0x14/0x20
        task_work_run+0x84/0x17c
        do_exit+0x2dc/0x8e0
        do_group_exit+0x38/0xa4
        __wake_up_parent+0x0/0x34
        invoke_syscall+0x48/0x114
        el0_svc_common.constprop.0+0x44/0xfc
        do_el0_svc+0x2c/0x94
        el0_svc+0x28/0x80
        el0t_64_sync_handler+0xa8/0x130
        el0t_64_sync+0x1a0/0x1a4
-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files
  2022-02-25 18:03 ` [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files Vlastimil Babka
@ 2022-02-27  0:18   ` Hyeonggon Yoo
  2022-02-27  0:22   ` Hyeonggon Yoo
  1 sibling, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  0:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:16PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Aggregate objects in slub cache by stack trace in addition to caller
> address when producing contents of debugfs files alloc_traces and
> free_traces in debugfs. Also add the stack traces to the debugfs
> output. This makes it much more useful to e.g. debug memory leaks.
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 3140f763e819..06599db4faa3 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5075,6 +5075,7 @@ EXPORT_SYMBOL(validate_slab_cache);
>   */
>  
>  struct location {
> +	depot_stack_handle_t handle;
>  	unsigned long count;
>  	unsigned long addr;
>  	long long sum_time;
> @@ -5127,9 +5128,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  {
>  	long start, end, pos;
>  	struct location *l;
> -	unsigned long caddr;
> +	unsigned long caddr, chandle;
>  	unsigned long age = jiffies - track->when;
> +	depot_stack_handle_t handle = 0;
>  
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(track->handle);
> +#endif
>  	start = -1;
>  	end = t->count;
>  
> @@ -5144,7 +5149,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  			break;
>  
>  		caddr = t->loc[pos].addr;
> -		if (track->addr == caddr) {
> +		chandle = t->loc[pos].handle;
> +		if ((track->addr == caddr) && (handle == chandle)) {
>
>  			l = &t->loc[pos];
>  			l->count++;
> @@ -5169,6 +5175,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  
>  		if (track->addr < caddr)
>  			end = pos;
> +		else if (track->addr == caddr && handle < chandle)
> +			end = pos;
>  		else
>  			start = pos;
>  	}
> @@ -5191,6 +5199,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  	l->max_time = age;
>  	l->min_pid = track->pid;
>  	l->max_pid = track->pid;
> +	l->handle = handle;
>  	cpumask_clear(to_cpumask(l->cpus));
>  	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
>  	nodes_clear(l->nodes);
> @@ -6102,6 +6111,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
>  			seq_printf(seq, " nodes=%*pbl",
>  				 nodemask_pr_args(&l->nodes));
>  
> +#ifdef CONFIG_STACKDEPOT
> +		{
> +			depot_stack_handle_t handle;
> +			unsigned long *entries;
> +			unsigned int nr_entries, j;
> +
> +			handle = READ_ONCE(l->handle);
> +			if (handle) {
> +				nr_entries = stack_depot_fetch(handle, &entries);
> +				seq_puts(seq, "\n");
> +				for (j = 0; j < nr_entries; j++)
> +					seq_printf(seq, "        %pS\n", (void *)entries[j]);
> +			}
> +		}
> +#endif
>  		seq_puts(seq, "\n");
>  	}
>

Yeah this is necessary as we collect not only caller address, but also
stacks. stacks can be different even if caller address is same. 
So we need to aggregate by both caller address and handle.

This patch looks good.

Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

And it works nicely. After this patch I see now it can differentiate
by stack too.

Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

I like this so much. This makes {free,alloc}_traces much more useful.

before patch:
# cat alloc_traces 
   2924 __d_alloc+0x30/0x3ac age=1/13709/14330 pid=0-184 cpus=0-3

after patch:
# cat alloc_traces 
    757 __d_alloc+0x30/0x3b0 age=2041/7771/7874 pid=1-179 cpus=0-3
        __slab_alloc.constprop.0+0x30/0x74
        kmem_cache_alloc+0x2c0/0x300
        __d_alloc+0x30/0x3b0
        d_alloc_parallel+0xd8/0x824
        path_openat+0xadc/0x16bc
        do_filp_open+0xf8/0x1f4
        do_sys_openat2+0x120/0x26c
        __arm64_sys_openat+0xf0/0x160
        invoke_syscall+0x60/0x190
        el0_svc_common.constprop.0+0x7c/0x160
        do_el0_svc+0x88/0xa4
        el0_svc+0x3c/0x80
        el0t_64_sync_handler+0xa8/0x130
        el0t_64_sync+0x1a0/0x1a4

    301 __d_alloc+0x30/0x3b0 age=8217/8237/8309 pid=51 cpus=1-2
        __slab_alloc.constprop.0+0x30/0x74
        kmem_cache_alloc+0x2c0/0x300
        __d_alloc+0x30/0x3b0
        d_alloc+0x30/0xd0
        __lookup_hash+0x70/0xf0
        filename_create+0xf4/0x220

	[...]

> -- 
> 2.35.1
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files
  2022-02-25 18:03 ` [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files Vlastimil Babka
  2022-02-27  0:18   ` Hyeonggon Yoo
@ 2022-02-27  0:22   ` Hyeonggon Yoo
  1 sibling, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  0:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:16PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Aggregate objects in slub cache by stack trace in addition to caller
> address when producing contents of debugfs files alloc_traces and
> free_traces in debugfs. Also add the stack traces to the debugfs
> output. This makes it much more useful to e.g. debug memory leaks.
>

I think it will be better if subject was something like
"Differentiate by stack and print stack traces in debugfs"?

> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 3140f763e819..06599db4faa3 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5075,6 +5075,7 @@ EXPORT_SYMBOL(validate_slab_cache);
>   */
>  
>  struct location {
> +	depot_stack_handle_t handle;
>  	unsigned long count;
>  	unsigned long addr;
>  	long long sum_time;
> @@ -5127,9 +5128,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  {
>  	long start, end, pos;
>  	struct location *l;
> -	unsigned long caddr;
> +	unsigned long caddr, chandle;
>  	unsigned long age = jiffies - track->when;
> +	depot_stack_handle_t handle = 0;
>  
> +#ifdef CONFIG_STACKDEPOT
> +	handle = READ_ONCE(track->handle);
> +#endif
>  	start = -1;
>  	end = t->count;
>  
> @@ -5144,7 +5149,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  			break;
>  
>  		caddr = t->loc[pos].addr;
> -		if (track->addr == caddr) {
> +		chandle = t->loc[pos].handle;
> +		if ((track->addr == caddr) && (handle == chandle)) {
>  
>  			l = &t->loc[pos];
>  			l->count++;
> @@ -5169,6 +5175,8 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  
>  		if (track->addr < caddr)
>  			end = pos;
> +		else if (track->addr == caddr && handle < chandle)
> +			end = pos;
>  		else
>  			start = pos;
>  	}
> @@ -5191,6 +5199,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
>  	l->max_time = age;
>  	l->min_pid = track->pid;
>  	l->max_pid = track->pid;
> +	l->handle = handle;
>  	cpumask_clear(to_cpumask(l->cpus));
>  	cpumask_set_cpu(track->cpu, to_cpumask(l->cpus));
>  	nodes_clear(l->nodes);
> @@ -6102,6 +6111,21 @@ static int slab_debugfs_show(struct seq_file *seq, void *v)
>  			seq_printf(seq, " nodes=%*pbl",
>  				 nodemask_pr_args(&l->nodes));
>  
> +#ifdef CONFIG_STACKDEPOT
> +		{
> +			depot_stack_handle_t handle;
> +			unsigned long *entries;
> +			unsigned int nr_entries, j;
> +
> +			handle = READ_ONCE(l->handle);
> +			if (handle) {
> +				nr_entries = stack_depot_fetch(handle, &entries);
> +				seq_puts(seq, "\n");
> +				for (j = 0; j < nr_entries; j++)
> +					seq_printf(seq, "        %pS\n", (void *)entries[j]);
> +			}
> +		}
> +#endif
>  		seq_puts(seq, "\n");
>  	}
>  
> -- 
> 2.35.1
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
  2022-02-26 10:24   ` Hyeonggon Yoo
@ 2022-02-27  3:08   ` Hyeonggon Yoo
  2022-02-27  5:06     ` kernel test robot
                       ` (3 more replies)
  2022-02-27  9:44   ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Hyeonggon Yoo
  2 siblings, 4 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  3:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov, Eric Dumazet,
	Jarkko Sakkinen, Johannes Berg, Yury Norov, Arnd Bergmann,
	James Bottomley, Matteo Croce, Marco Elver, Andrey Konovalov,
	Imran Khan, Zqiang, Hyeonggon Yoo

After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
stack_table allocation by kvmalloc()"), stack_depot_init() is called
later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
usage. It allocates stack_table using memblock_alloc() or kvmalloc()
depending on availability of slab allocator.

But when stack_depot_init() is called while creating boot slab caches,
both slab allocator and memblock is not available. So kernel crashes.
Allocate stack_table from page allocator when both slab allocator and
memblock is unavailable.

Limit size of stack_table when using page allocator because vmalloc()
is also unavailable in kmem_cache_init(). it must not be larger than
(PAGE_SIZE << (MAX_ORDER - 1)).

This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.

Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 lib/stackdepot.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..606f80ae2bf7 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -73,6 +73,14 @@ static int next_slab_inited;
 static size_t depot_offset;
 static DEFINE_RAW_SPINLOCK(depot_lock);
 
+static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
+static inline unsigned int stack_hash_mask(void)
+{
+	return stack_hash_size - 1;
+}
+
+#define STACK_HASH_SEED 0x9747b28c
+
 static bool init_stack_slab(void **prealloc)
 {
 	if (!*prealloc)
@@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	return stack;
 }
 
-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
-#define STACK_HASH_SEED 0x9747b28c
-
 static bool stack_depot_disable;
 static struct stack_record **stack_table;
 
@@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+		size_t size = (stack_hash_size * sizeof(struct stack_record *));
 		int i;
 
 		if (slab_is_available()) {
 			pr_info("Stack Depot allocating hash table with kvmalloc\n");
 			stack_table = kvmalloc(size, GFP_KERNEL);
+		} else if (totalram_pages() > 0) {
+			/* Reduce size because vmalloc may be unavailable */
+			size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
+			stack_hash_size = size / sizeof(struct stack_record *);
+
+			pr_info("Stack Depot allocating hash table with __get_free_pages\n");
+			stack_table = (struct stack_record **)
+				      __get_free_pages(GFP_KERNEL, get_order(size));
 		} else {
 			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
 			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
 		}
+
 		if (stack_table) {
-			for (i = 0; i < STACK_HASH_SIZE;  i++)
+			pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
+			for (i = 0; i < stack_hash_size;  i++)
 				stack_table[i] = NULL;
 		} else {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		goto fast_exit;
 
 	hash = hash_stack(entries, nr_entries);
-	bucket = &stack_table[hash & STACK_HASH_MASK];
+	bucket = &stack_table[hash & stack_hash_mask()];
 
 	/*
 	 * Fast path: look the stack trace up without locking.
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches
  2022-02-25 18:03 ` [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
@ 2022-02-27  3:49   ` Hyeonggon Yoo
  2022-03-02 16:31     ` Vlastimil Babka
  0 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  3:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap,
	linux-doc

On Fri, Feb 25, 2022 at 07:03:18PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Add description of debugfs files alloc_traces and free_traces
> to SLUB cache documentation.
> 
> [ vbabka@suse.cz: some rewording ]
> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-doc@vger.kernel.org
> ---
>  Documentation/vm/slub.rst | 61 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
> index d3028554b1e9..2b2b931e59fc 100644
> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -384,5 +384,66 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
>        40,60`` range will plot only samples collected between 40th and
>        60th seconds).
>  
> +
> +DebugFS files for SLUB
> +======================
> +
> +For more information about current state of SLUB caches with the user tracking
> +debug option enabled, debugfs files are available, typically under
> +/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
> +tracking). There are 2 types of these files with the following debug
> +information:
> +
> +1. alloc_traces::
> +
> +    Prints information about unique allocation traces of the currently
> +    allocated objects. The output is sorted by frequency of each trace.
> +
> +    Information in the output:
> +    Number of objects, allocating function, minimal/average/maximal jiffies since alloc,
> +    pid range of the allocating processes, cpu mask of allocating cpus, and stack trace.
> +
> +    Example:::
> +
> +    1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1::
> +	__slab_alloc+0x6d/0x90
> +	kmem_cache_alloc_trace+0x2eb/0x300
> +	populate_error_injection_list+0x97/0x110
> +	init_error_injection+0x1b/0x71
> +	do_one_initcall+0x5f/0x2d0
> +	kernel_init_freeable+0x26f/0x2d7
> +	kernel_init+0xe/0x118
> +	ret_from_fork+0x22/0x30
> +
> +
> +2. free_traces::
> +
> +    Prints information about unique free traces of the currently free objects,
> +    sorted by their frequency.
> +

I'm not sure that it's traces of the "currently free objects".

static int slab_debug_trace_open(struct inode *inode, struct file *filep)
{
	[...]
        
	obj_map = bitmap_alloc(oo_objects(s->oo), GFP_KERNEL);
	
	[...]

        for_each_kmem_cache_node(s, node, n) {
                unsigned long flags;
                struct slab *slab;

                if (!atomic_long_read(&n->nr_slabs))
                        continue;

                spin_lock_irqsave(&n->list_lock, flags);
                list_for_each_entry(slab, &n->partial, slab_list)
                        process_slab(t, s, slab, alloc, obj_map);
                list_for_each_entry(slab, &n->full, slab_list)
                        process_slab(t, s, slab, alloc, obj_map);
                spin_unlock_irqrestore(&n->list_lock, flags);
        }

	[...]

}

static void __fill_map(unsigned long *obj_map, struct kmem_cache *s,
                       struct slab *slab)
{
        void *addr = slab_address(slab);
        void *p;

        bitmap_zero(obj_map, slab->objects);

        for (p = slab->freelist; p; p = get_freepointer(s, p))
                set_bit(__obj_to_index(s, addr, p), obj_map);
}

static void process_slab(struct loc_track *t, struct kmem_cache *s,
                struct slab *slab, enum track_item alloc,
                unsigned long *obj_map)
{
        void *addr = slab_address(slab);
        void *p;

        __fill_map(obj_map, s, slab);

        for_each_object(p, s, addr, slab->objects)
                if (!test_bit(__obj_to_index(s, addr, p), obj_map))
                        add_location(t, s, get_track(s, p, alloc));
}

I think it's not traces of "currently free objects"
because index bit of free objects are set in obj_map bitmap?

It's weird but it's traces of allocated objects that have been freed at
least once (or <not available>)

I think we can fix the code or doc?

Please tell me if I'm missing something :)

> +    Information in the output:
> +    Number of objects, freeing function, minimal/average/maximal jiffies since free,
> +    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
> +
> +    Example:::
> +
> +    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
> +	kfree+0x2db/0x420
> +	acpi_ut_update_ref_count+0x6a6/0x782
> +	acpi_ut_update_object_reference+0x1ad/0x234
> +	acpi_ut_remove_reference+0x7d/0x84
> +	acpi_rs_get_prt_method_data+0x97/0xd6
> +	acpi_get_irq_routing_table+0x82/0xc4
> +	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
> +	acpi_pci_irq_lookup+0x3a/0x1e0
> +	acpi_pci_irq_enable+0x77/0x240
> +	pcibios_enable_device+0x39/0x40
> +	do_pci_enable_device.part.0+0x5d/0xe0
> +	pci_enable_device_flags+0xfc/0x120
> +	pci_enable_device+0x13/0x20
> +	virtio_pci_probe+0x9e/0x170
> +	local_pci_probe+0x48/0x80
> +	pci_device_probe+0x105/0x1c0
> +

Everything else looks nice!

>  Christoph Lameter, May 30, 2007
>  Sergey Senozhatsky, October 23, 2015
> -- 
> 2.35.1
> 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
@ 2022-02-27  5:06     ` kernel test robot
  2022-02-27  9:23     ` [PATCH v2] " Hyeonggon Yoo
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2022-02-27  5:06 UTC (permalink / raw)
  To: Hyeonggon Yoo, Vlastimil Babka
  Cc: llvm, kbuild-all, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton,
	Linux Memory Management List, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov, Eric Dumazet,
	Jarkko Sakkinen, Johannes Berg, Yury Norov, Arnd Bergmann,
	James Bottomley, Matteo Croce, Marco Elver, Andrey Konovalov,
	Imran Khan, Zqiang, Hyeonggon Yoo

Hi Hyeonggon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.17-rc5 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2293be58d6a18cab800e25e42081bacb75c05752
config: hexagon-randconfig-r005-20220227 (https://download.01.org/0day-ci/archive/20220227/202202271324.VLaJlMYW-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/fd37f88eccc357002cc03a6a5fac60fb42552bc7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
        git checkout fd37f88eccc357002cc03a6a5fac60fb42552bc7
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> lib/stackdepot.c:187:11: warning: comparison of distinct pointer types ('typeof (size) *' (aka 'unsigned int *') and 'typeof ((1UL << 14) << (11 - 1)) *' (aka 'unsigned long *')) [-Wcompare-distinct-pointer-types]
                           size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/minmax.h:45:19: note: expanded from macro 'min'
   #define min(x, y)       __careful_cmp(x, y, <)
                           ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/minmax.h:36:24: note: expanded from macro '__careful_cmp'
           __builtin_choose_expr(__safe_cmp(x, y), \
                                 ^~~~~~~~~~~~~~~~
   include/linux/minmax.h:26:4: note: expanded from macro '__safe_cmp'
                   (__typecheck(x, y) && __no_side_effects(x, y))
                    ^~~~~~~~~~~~~~~~~
   include/linux/minmax.h:20:28: note: expanded from macro '__typecheck'
           (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                      ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
   1 warning generated.


vim +187 lib/stackdepot.c

   168	
   169	/*
   170	 * __ref because of memblock_alloc(), which will not be actually called after
   171	 * the __init code is gone, because at that point slab_is_available() is true
   172	 */
   173	__ref int stack_depot_init(void)
   174	{
   175		static DEFINE_MUTEX(stack_depot_init_mutex);
   176	
   177		mutex_lock(&stack_depot_init_mutex);
   178		if (!stack_depot_disable && !stack_table) {
   179			size_t size = (stack_hash_size * sizeof(struct stack_record *));
   180			int i;
   181	
   182			if (slab_is_available()) {
   183				pr_info("Stack Depot allocating hash table with kvmalloc\n");
   184				stack_table = kvmalloc(size, GFP_KERNEL);
   185			} else if (totalram_pages() > 0) {
   186				/* Reduce size because vmalloc may be unavailable */
 > 187				size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
   188				stack_hash_size = size / sizeof(struct stack_record *);
   189	
   190				pr_info("Stack Depot allocating hash table with __get_free_pages\n");
   191				stack_table = (struct stack_record **)
   192					      __get_free_pages(GFP_KERNEL, get_order(size));
   193			} else {
   194				pr_info("Stack Depot allocating hash table with memblock_alloc\n");
   195				stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
   196			}
   197	
   198			if (stack_table) {
   199				pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
   200				for (i = 0; i < stack_hash_size;  i++)
   201					stack_table[i] = NULL;
   202			} else {
   203				pr_err("Stack Depot hash table allocation failed, disabling\n");
   204				stack_depot_disable = true;
   205				mutex_unlock(&stack_depot_init_mutex);
   206				return -ENOMEM;
   207			}
   208		}
   209		mutex_unlock(&stack_depot_init_mutex);
   210		return 0;
   211	}
   212	EXPORT_SYMBOL_GPL(stack_depot_init);
   213	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
  2022-02-27  5:06     ` kernel test robot
@ 2022-02-27  9:23     ` Hyeonggon Yoo
  2022-02-27 10:00     ` [PATCH] " kernel test robot
  2022-02-28  7:00     ` Marco Elver
  3 siblings, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  9:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov, Eric Dumazet,
	Jarkko Sakkinen, Johannes Berg, Yury Norov, Arnd Bergmann,
	James Bottomley, Matteo Croce, Marco Elver, Andrey Konovalov,
	Imran Khan, Zqiang

After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
stack_table allocation by kvmalloc()"), stack_depot_init() is called
later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
usage. It allocates stack_table using memblock_alloc() or kvmalloc()
depending on availability of slab allocator.

But when stack_depot_init() is called while creating boot slab caches,
both slab allocator and memblock is not available. So kernel crashes.
Allocate stack_table from page allocator when both slab allocator and
memblock is unavailable.

Limit size of stack_table when using page allocator because vmalloc()
is also unavailable in kmem_cache_init(). It must not be larger than
(PAGE_SIZE << (MAX_ORDER - 1)).

This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.

[ lkp@intel.com: Fix W=1 build warning ]

Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 lib/stackdepot.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af0500..a96f8fd78c42 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -73,6 +73,14 @@ static int next_slab_inited;
 static size_t depot_offset;
 static DEFINE_RAW_SPINLOCK(depot_lock);
 
+static size_t stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
+static inline size_t stack_hash_mask(void)
+{
+	return stack_hash_size - 1;
+}
+
+#define STACK_HASH_SEED 0x9747b28c
+
 static bool init_stack_slab(void **prealloc)
 {
 	if (!*prealloc)
@@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	return stack;
 }
 
-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
-#define STACK_HASH_SEED 0x9747b28c
-
 static bool stack_depot_disable;
 static struct stack_record **stack_table;
 
@@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+		size_t size = (stack_hash_size * sizeof(struct stack_record *));
 		int i;
 
 		if (slab_is_available()) {
 			pr_info("Stack Depot allocating hash table with kvmalloc\n");
 			stack_table = kvmalloc(size, GFP_KERNEL);
+		} else if (totalram_pages() > 0) {
+			/* Reduce size because vmalloc may be unavailable */
+			size = min_t(size_t, size, PAGE_SIZE << (MAX_ORDER - 1));
+			stack_hash_size = size / sizeof(struct stack_record *);
+
+			pr_info("Stack Depot allocating hash table with __get_free_pages\n");
+			stack_table = (struct stack_record **)
+				      __get_free_pages(GFP_KERNEL, get_order(size));
 		} else {
 			pr_info("Stack Depot allocating hash table with memblock_alloc\n");
 			stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
 		}
+
 		if (stack_table) {
-			for (i = 0; i < STACK_HASH_SIZE;  i++)
+			pr_info("Stack Depot hash table size=%zu\n", stack_hash_size);
+			for (i = 0; i < stack_hash_size;  i++)
 				stack_table[i] = NULL;
 		} else {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		goto fast_exit;
 
 	hash = hash_stack(entries, nr_entries);
-	bucket = &stack_table[hash & STACK_HASH_MASK];
+	bucket = &stack_table[hash & stack_hash_mask()];
 
 	/*
 	 * Fast path: look the stack trace up without locking.
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
  2022-02-26 10:24   ` Hyeonggon Yoo
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
@ 2022-02-27  9:44   ` Hyeonggon Yoo
  2022-03-02 16:51     ` Vlastimil Babka
  2 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-27  9:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> From: Oliver Glitta <glittao@gmail.com>
> 
> Many stack traces are similar so there are many similar arrays.
> Stackdepot saves each unique stack only once.
>
> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
> stackdepot to save stack trace.
>

I think it's not a replacement?

> The benefits are smaller memory overhead and possibility to aggregate
> per-cache statistics in the following patch using the stackdepot handle
> instead of matching stacks manually.
> 
> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
> 
> This was initially merged as commit 788691464c29 and reverted by commit
> ae14c63a9f20 due to several issues, that should now be fixed.
> The problem of unconditional memory overhead by stackdepot has been
> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> and stack_table allocation by kvmalloc()"), so the dependency on
> stackdepot will result in extra memory usage only when a slab cache
> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> The build failures on some architectures were also addressed, and the
> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> patch.

This is just an idea and beyond this patch.

After this patch, now we have external storage that records stack traces.

It's possible that some rare stack traces are in stack depot, but
not reachable because track is overwritten.

I think it's worth implementing a way to iterate through stacks in stack depot?

> 
> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
  2022-02-27  5:06     ` kernel test robot
  2022-02-27  9:23     ` [PATCH v2] " Hyeonggon Yoo
@ 2022-02-27 10:00     ` kernel test robot
  2022-02-28  7:00     ` Marco Elver
  3 siblings, 0 replies; 46+ messages in thread
From: kernel test robot @ 2022-02-27 10:00 UTC (permalink / raw)
  To: Hyeonggon Yoo, Vlastimil Babka
  Cc: kbuild-all, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton,
	Linux Memory Management List, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov, Eric Dumazet,
	Jarkko Sakkinen, Johannes Berg, Yury Norov, Arnd Bergmann,
	James Bottomley, Matteo Croce, Marco Elver, Andrey Konovalov,
	Imran Khan, Zqiang, Hyeonggon Yoo

Hi Hyeonggon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.17-rc5 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2293be58d6a18cab800e25e42081bacb75c05752
config: i386-randconfig-s002 (https://download.01.org/0day-ci/archive/20220227/202202271714.D69JHjzb-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/fd37f88eccc357002cc03a6a5fac60fb42552bc7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Hyeonggon-Yoo/lib-stackdepot-Use-page-allocator-if-both-slab-and-memblock-is-unavailable/20220227-111029
        git checkout fd37f88eccc357002cc03a6a5fac60fb42552bc7
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> lib/stackdepot.c:187:32: sparse: sparse: incompatible types in comparison expression (different type sizes):
>> lib/stackdepot.c:187:32: sparse:    unsigned int *
>> lib/stackdepot.c:187:32: sparse:    unsigned long *

vim +187 lib/stackdepot.c

   168	
   169	/*
   170	 * __ref because of memblock_alloc(), which will not be actually called after
   171	 * the __init code is gone, because at that point slab_is_available() is true
   172	 */
   173	__ref int stack_depot_init(void)
   174	{
   175		static DEFINE_MUTEX(stack_depot_init_mutex);
   176	
   177		mutex_lock(&stack_depot_init_mutex);
   178		if (!stack_depot_disable && !stack_table) {
   179			size_t size = (stack_hash_size * sizeof(struct stack_record *));
   180			int i;
   181	
   182			if (slab_is_available()) {
   183				pr_info("Stack Depot allocating hash table with kvmalloc\n");
   184				stack_table = kvmalloc(size, GFP_KERNEL);
   185			} else if (totalram_pages() > 0) {
   186				/* Reduce size because vmalloc may be unavailable */
 > 187				size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
   188				stack_hash_size = size / sizeof(struct stack_record *);
   189	
   190				pr_info("Stack Depot allocating hash table with __get_free_pages\n");
   191				stack_table = (struct stack_record **)
   192					      __get_free_pages(GFP_KERNEL, get_order(size));
   193			} else {
   194				pr_info("Stack Depot allocating hash table with memblock_alloc\n");
   195				stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
   196			}
   197	
   198			if (stack_table) {
   199				pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
   200				for (i = 0; i < stack_hash_size;  i++)
   201					stack_table[i] = NULL;
   202			} else {
   203				pr_err("Stack Depot hash table allocation failed, disabling\n");
   204				stack_depot_disable = true;
   205				mutex_unlock(&stack_depot_init_mutex);
   206				return -ENOMEM;
   207			}
   208		}
   209		mutex_unlock(&stack_depot_init_mutex);
   210		return 0;
   211	}
   212	EXPORT_SYMBOL_GPL(stack_depot_init);
   213	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
                       ` (2 preceding siblings ...)
  2022-02-27 10:00     ` [PATCH] " kernel test robot
@ 2022-02-28  7:00     ` Marco Elver
  2022-02-28 10:05       ` Hyeonggon Yoo
  3 siblings, 1 reply; 46+ messages in thread
From: Marco Elver @ 2022-02-28  7:00 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Sun, 27 Feb 2022 at 04:08, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
> stack_table allocation by kvmalloc()"), stack_depot_init() is called
> later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
> usage. It allocates stack_table using memblock_alloc() or kvmalloc()
> depending on availability of slab allocator.
>
> But when stack_depot_init() is called while creating boot slab caches,
> both slab allocator and memblock is not available. So kernel crashes.
> Allocate stack_table from page allocator when both slab allocator and
> memblock is unavailable.

This is odd - who is calling stack_depot_init() while neither slab nor
memblock are available? Do you have a stacktrace?

> Limit size of stack_table when using page allocator because vmalloc()
> is also unavailable in kmem_cache_init(). it must not be larger than
> (PAGE_SIZE << (MAX_ORDER - 1)).
>
> This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.
>
> Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
>  lib/stackdepot.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index bf5ba9af0500..606f80ae2bf7 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -73,6 +73,14 @@ static int next_slab_inited;
>  static size_t depot_offset;
>  static DEFINE_RAW_SPINLOCK(depot_lock);
>
> +static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
> +static inline unsigned int stack_hash_mask(void)
> +{
> +       return stack_hash_size - 1;
> +}
> +
> +#define STACK_HASH_SEED 0x9747b28c
> +
>  static bool init_stack_slab(void **prealloc)
>  {
>         if (!*prealloc)
> @@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         return stack;
>  }
>
> -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> -#define STACK_HASH_SEED 0x9747b28c
> -
>  static bool stack_depot_disable;
>  static struct stack_record **stack_table;
>
> @@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
>
>         mutex_lock(&stack_depot_init_mutex);
>         if (!stack_depot_disable && !stack_table) {
> -               size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> +               size_t size = (stack_hash_size * sizeof(struct stack_record *));
>                 int i;
>
>                 if (slab_is_available()) {
>                         pr_info("Stack Depot allocating hash table with kvmalloc\n");
>                         stack_table = kvmalloc(size, GFP_KERNEL);
> +               } else if (totalram_pages() > 0) {
> +                       /* Reduce size because vmalloc may be unavailable */
> +                       size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
> +                       stack_hash_size = size / sizeof(struct stack_record *);
> +
> +                       pr_info("Stack Depot allocating hash table with __get_free_pages\n");
> +                       stack_table = (struct stack_record **)
> +                                     __get_free_pages(GFP_KERNEL, get_order(size));
>                 } else {
>                         pr_info("Stack Depot allocating hash table with memblock_alloc\n");
>                         stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
>                 }
> +
>                 if (stack_table) {
> -                       for (i = 0; i < STACK_HASH_SIZE;  i++)
> +                       pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
> +                       for (i = 0; i < stack_hash_size;  i++)
>                                 stack_table[i] = NULL;
>                 } else {
>                         pr_err("Stack Depot hash table allocation failed, disabling\n");
> @@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 goto fast_exit;
>
>         hash = hash_stack(entries, nr_entries);
> -       bucket = &stack_table[hash & STACK_HASH_MASK];
> +       bucket = &stack_table[hash & stack_hash_mask()];
>
>         /*
>          * Fast path: look the stack trace up without locking.
> --
> 2.33.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-28  7:00     ` Marco Elver
@ 2022-02-28 10:05       ` Hyeonggon Yoo
  2022-02-28 10:50         ` Marco Elver
  0 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-28 10:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Mon, Feb 28, 2022 at 08:00:00AM +0100, Marco Elver wrote:
> On Sun, 27 Feb 2022 at 04:08, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > After commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and
> > stack_table allocation by kvmalloc()"), stack_depot_init() is called
> > later if CONFIG_STACKDEPOT_ALWAYS_INIT=n to remove unnecessary memory
> > usage. It allocates stack_table using memblock_alloc() or kvmalloc()
> > depending on availability of slab allocator.
> >
> > But when stack_depot_init() is called while creating boot slab caches,
> > both slab allocator and memblock is not available. So kernel crashes.
> > Allocate stack_table from page allocator when both slab allocator and
> > memblock is unavailable.
> 
> This is odd - who is calling stack_depot_init() while neither slab nor
> memblock are available? 

It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
If user is debugging cache, it calls stack_depot_init() when creating
cache.

> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>  	s->remote_node_defrag_ratio = 1000;
>  #endif
> 
> +	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> +		stack_depot_init();
> +

Oliver's patch series enables stack depot when arch supports stacktrace,
to store slab objects' stack traces. (as slub debugging feature.)

Because slub debugging is turned on by default, the commit 2dba5eb1c73b
("lib/stackdepot: allow optional init and stack_table allocation by
kvmalloc()") made stack_depot_init() can be called later.

With Oliver's patch applied, stack_depot_init() can be called in
contexts below:

  1) only memblock available (for kasan)
  2) only buddy available, vmalloc/memblock unavailable (for boot caches)
  3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
  4) buddy/slab/vmalloc available, memblock unavailable (other caches)

SLUB supports enabling debugging for specific cache by passing
slub_debug boot parameter. As slab caches can be created in
various context, stack_depot_init() should consider all contexts above.

Writing this, I realized my patch does not handle case 3).. I'll send v3.

[1] https://lore.kernel.org/linux-mm/YhoakP7Kih%2FYUgiN@ip-172-31-19-208.ap-northeast-1.compute.internal/T/#t
[2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1

> Do you have a stacktrace?

Yeah, here:

You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n

[    0.000000] Stack Depot allocating hash table with memblock_alloc
[    0.000000] Unable to handle kernel paging request at virtual address ffff000097400000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000047
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x07: level 3 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000047
[    0.000000]   CM = 0, WnR = 1
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041719000
[    0.000000] [ffff000097400000] pgd=18000000dcff8003, p4d=18000000dcff8003, pud=18000000dcbfe003, pmd=18000000dcb43003, pte=00680000d7400706
[    0.000000] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-11918-gbf5d03166d75 #51
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : __memset+0x16c/0x188
[    0.000000] lr : memblock_alloc_try_nid+0xcc/0xe4
[    0.000000] sp : ffff800009a33cd0
[    0.000000] x29: ffff800009a33cd0 x28: 0000000041720018 x27: ffff800009362640
[    0.000000] x26: ffff800009362640 x25: 0000000000000000 x24: 0000000000000000
[    0.000000] x23: 0000000000002000 x22: ffff80000932bb50 x21: 00000000ffffffff
[    0.000000] x20: ffff000097400000 x19: 0000000000800000 x18: ffffffffffffffff
[    0.000000] x17: 373578302f383278 x16: 302b657461657263 x15: 0000001000000000
[    0.000000] x14: 0000000000000360 x13: 0000000000009f8c x12: 00000000dcb0c070
[    0.000000] x11: 0000001000000000 x10: 00000000004ea000 x9 : 0000000000000000
[    0.000000] x8 : ffff000097400000 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : 00000000007fffc0 x1 : 0000000000000000 x0 : ffff000097400000
[    0.000000] Call trace:
[    0.000000]  __memset+0x16c/0x188
[    0.000000]  stack_depot_init+0xc8/0x100
[    0.000000]  __kmem_cache_create+0x454/0x570
[    0.000000]  create_boot_cache+0xa0/0xe0
[    0.000000]  kmem_cache_init+0xf8/0x204
[    0.000000]  start_kernel+0x3ec/0x668
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Thanks!

> > Limit size of stack_table when using page allocator because vmalloc()
> > is also unavailable in kmem_cache_init(). it must not be larger than
> > (PAGE_SIZE << (MAX_ORDER - 1)).
> >
> > This patch was tested on both CONFIG_STACKDEPOT_ALWAYS_INIT=y and n.
> >
> > Fixes: 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()")
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > ---
> >  lib/stackdepot.c | 28 +++++++++++++++++++++-------
> >  1 file changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> > index bf5ba9af0500..606f80ae2bf7 100644
> > --- a/lib/stackdepot.c
> > +++ b/lib/stackdepot.c
> > @@ -73,6 +73,14 @@ static int next_slab_inited;
> >  static size_t depot_offset;
> >  static DEFINE_RAW_SPINLOCK(depot_lock);
> >
> > +static unsigned int stack_hash_size = (1 << CONFIG_STACK_HASH_ORDER);
> > +static inline unsigned int stack_hash_mask(void)
> > +{
> > +       return stack_hash_size - 1;
> > +}
> > +
> > +#define STACK_HASH_SEED 0x9747b28c
> > +
> >  static bool init_stack_slab(void **prealloc)
> >  {
> >         if (!*prealloc)
> > @@ -142,10 +150,6 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
> >         return stack;
> >  }
> >
> > -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> > -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> > -#define STACK_HASH_SEED 0x9747b28c
> > -
> >  static bool stack_depot_disable;
> >  static struct stack_record **stack_table;
> >
> > @@ -172,18 +176,28 @@ __ref int stack_depot_init(void)
> >
> >         mutex_lock(&stack_depot_init_mutex);
> >         if (!stack_depot_disable && !stack_table) {
> > -               size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> > +               size_t size = (stack_hash_size * sizeof(struct stack_record *));
> >                 int i;
> >
> >                 if (slab_is_available()) {
> >                         pr_info("Stack Depot allocating hash table with kvmalloc\n");
> >                         stack_table = kvmalloc(size, GFP_KERNEL);
> > +               } else if (totalram_pages() > 0) {
> > +                       /* Reduce size because vmalloc may be unavailable */
> > +                       size = min(size, PAGE_SIZE << (MAX_ORDER - 1));
> > +                       stack_hash_size = size / sizeof(struct stack_record *);
> > +
> > +                       pr_info("Stack Depot allocating hash table with __get_free_pages\n");
> > +                       stack_table = (struct stack_record **)
> > +                                     __get_free_pages(GFP_KERNEL, get_order(size));
> >                 } else {
> >                         pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> >                         stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> >                 }
> > +
> >                 if (stack_table) {
> > -                       for (i = 0; i < STACK_HASH_SIZE;  i++)
> > +                       pr_info("Stack Depot hash table size=%u\n", stack_hash_size);
> > +                       for (i = 0; i < stack_hash_size;  i++)
> >                                 stack_table[i] = NULL;
> >                 } else {
> >                         pr_err("Stack Depot hash table allocation failed, disabling\n");
> > @@ -363,7 +377,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
> >                 goto fast_exit;
> >
> >         hash = hash_stack(entries, nr_entries);
> > -       bucket = &stack_table[hash & STACK_HASH_MASK];
> > +       bucket = &stack_table[hash & stack_hash_mask()];
> >
> >         /*
> >          * Fast path: look the stack trace up without locking.
> > --
> > 2.33.1

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-28 10:05       ` Hyeonggon Yoo
@ 2022-02-28 10:50         ` Marco Elver
  2022-02-28 11:48           ` Hyeonggon Yoo
  2022-02-28 15:09           ` [PATCH] mm/slub: initialize stack depot in boot process Hyeonggon Yoo
  0 siblings, 2 replies; 46+ messages in thread
From: Marco Elver @ 2022-02-28 10:50 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Mon, 28 Feb 2022 at 11:05, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
[...]
> > This is odd - who is calling stack_depot_init() while neither slab nor
> > memblock are available?
>
> It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
> If user is debugging cache, it calls stack_depot_init() when creating
> cache.
>
> > @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> >       s->remote_node_defrag_ratio = 1000;
> >  #endif
> >
> > +     if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > +             stack_depot_init();
> > +
>
> Oliver's patch series enables stack depot when arch supports stacktrace,
> to store slab objects' stack traces. (as slub debugging feature.)
>
> Because slub debugging is turned on by default, the commit 2dba5eb1c73b
> ("lib/stackdepot: allow optional init and stack_table allocation by
> kvmalloc()") made stack_depot_init() can be called later.
>
> With Oliver's patch applied, stack_depot_init() can be called in
> contexts below:
>
>   1) only memblock available (for kasan)
>   2) only buddy available, vmalloc/memblock unavailable (for boot caches)
>   3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
>   4) buddy/slab/vmalloc available, memblock unavailable (other caches)
>
> SLUB supports enabling debugging for specific cache by passing
> slub_debug boot parameter. As slab caches can be created in
> various context, stack_depot_init() should consider all contexts above.
>
> Writing this, I realized my patch does not handle case 3).. I'll send v3.
>
> [1] https://lore.kernel.org/linux-mm/YhoakP7Kih%2FYUgiN@ip-172-31-19-208.ap-northeast-1.compute.internal/T/#t
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>
> > Do you have a stacktrace?
>
> Yeah, here:
>
> You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
> slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n
>
[...]
> [    0.000000] Call trace:
> [    0.000000]  __memset+0x16c/0x188
> [    0.000000]  stack_depot_init+0xc8/0x100
> [    0.000000]  __kmem_cache_create+0x454/0x570
> [    0.000000]  create_boot_cache+0xa0/0xe0

I think even before this point you have all the information required
to determine if stackdepot will be required. It's available after
setup_slub_debug().

So why can't you just call stack_depot_init() somewhere else and avoid
all this complexity?

> [    0.000000]  kmem_cache_init+0xf8/0x204
> [    0.000000]  start_kernel+0x3ec/0x668
> [    0.000000]  __primary_switched+0xc0/0xc8
> [    0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable
  2022-02-28 10:50         ` Marco Elver
@ 2022-02-28 11:48           ` Hyeonggon Yoo
  2022-02-28 15:09           ` [PATCH] mm/slub: initialize stack depot in boot process Hyeonggon Yoo
  1 sibling, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-28 11:48 UTC (permalink / raw)
  To: Marco Elver
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Mon, Feb 28, 2022 at 11:50:49AM +0100, Marco Elver wrote:
> On Mon, 28 Feb 2022 at 11:05, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> [...]
> > > This is odd - who is calling stack_depot_init() while neither slab nor
> > > memblock are available?
> >
> > It's not merged yet - but Oliver's patch (2/5) in his series [1] does:
> > If user is debugging cache, it calls stack_depot_init() when creating
> > cache.
> >
> > > @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > >       s->remote_node_defrag_ratio = 1000;
> > >  #endif
> > >
> > > +     if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > +             stack_depot_init();
> > > +
> >
> > Oliver's patch series enables stack depot when arch supports stacktrace,
> > to store slab objects' stack traces. (as slub debugging feature.)
> >
> > Because slub debugging is turned on by default, the commit 2dba5eb1c73b
> > ("lib/stackdepot: allow optional init and stack_table allocation by
> > kvmalloc()") made stack_depot_init() can be called later.
> >
> > With Oliver's patch applied, stack_depot_init() can be called in
> > contexts below:
> >
> >   1) only memblock available (for kasan)
> >   2) only buddy available, vmalloc/memblock unavailable (for boot caches)
> >   3) buddy/slab available, vmalloc/memblock unavailable (vmap_area cache)
> >   4) buddy/slab/vmalloc available, memblock unavailable (other caches)
> >
> > SLUB supports enabling debugging for specific cache by passing
> > slub_debug boot parameter. As slab caches can be created in
> > various context, stack_depot_init() should consider all contexts above.
> >
> > Writing this, I realized my patch does not handle case 3).. I'll send v3.
> >
> > [1] https://lore.kernel.org/linux-mm/YhoakP7Kih%2FYUgiN@ip-172-31-19-208.ap-northeast-1.compute.internal/T/#t
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >
> > > Do you have a stacktrace?
> >
> > Yeah, here:
> >
> > You can reproduce this on vbabka's slab-stackdepot-v1 branch [2] with
> > slub_debug=U, and CONFIG_STACKDEPOT_ALWAYS_INIT=n
> >
> [...]
> > [    0.000000] Call trace:
> > [    0.000000]  __memset+0x16c/0x188
> > [    0.000000]  stack_depot_init+0xc8/0x100
> > [    0.000000]  __kmem_cache_create+0x454/0x570
> > [    0.000000]  create_boot_cache+0xa0/0xe0
> 
> I think even before this point you have all the information required
> to determine if stackdepot will be required. It's available after
> setup_slub_debug().
> 
> So why can't you just call stack_depot_init() somewhere else and avoid
> all this complexity?
>

You are right. That is much simpler and sound good as SLUB does not
support enabling SLAB_STORE_USER flag when system is up.

I'll try this approach.
Thank you!

> > [    0.000000]  kmem_cache_init+0xf8/0x204
> > [    0.000000]  start_kernel+0x3ec/0x668
> > [    0.000000]  __primary_switched+0xc0/0xc8
> > [    0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] mm/slub: initialize stack depot in boot process
  2022-02-28 10:50         ` Marco Elver
  2022-02-28 11:48           ` Hyeonggon Yoo
@ 2022-02-28 15:09           ` Hyeonggon Yoo
  2022-02-28 16:28             ` Marco Elver
  2022-03-01  0:28             ` Vlastimil Babka
  1 sibling, 2 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-28 15:09 UTC (permalink / raw)
  To: Marco Elver
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
objects") initializes stack depot while creating cache if SLAB_STORE_USER
flag is set.

This can make kernel crash because a cache can be created in various
contexts. For example if user sets slub_debug=U, kernel crashes
because create_boot_cache() calls stack_depot_init(), which tries to
allocate hash table using memblock_alloc() if slab is not available.
But memblock is also not available at that time.

This patch solves the problem by initializing stack depot early
in boot process if SLAB_STORE_USER debug flag is set globally
or the flag is set to at least one cache.

[ elver@google.com: initialize stack depot depending on slub_debug
  parameter instead of allowing stack_depot_init() can be called
  in kmem_cache_init() for simplicity. ]

Link: https://lkml.org/lkml/2022/2/28/238
Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 include/linux/slab.h |  1 +
 init/main.c          |  1 +
 mm/slab.c            |  4 ++++
 mm/slob.c            |  4 ++++
 mm/slub.c            | 28 +++++++++++++++++++++++++---
 5 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 37bde99b74af..023f3f71ae35 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -139,6 +139,7 @@ struct mem_cgroup;
 /*
  * struct kmem_cache related prototypes
  */
+void __init kmem_cache_init_early(void);
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
diff --git a/init/main.c b/init/main.c
index 65fa2e41a9c0..4fdb7975a085 100644
--- a/init/main.c
+++ b/init/main.c
@@ -835,6 +835,7 @@ static void __init mm_init(void)
 	kfence_alloc_pool();
 	report_meminit();
 	stack_depot_early_init();
+	kmem_cache_init_early();
 	mem_init();
 	mem_init_print_info();
 	kmem_cache_init();
diff --git a/mm/slab.c b/mm/slab.c
index ddf5737c63d9..80a6d01aab06 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1196,6 +1196,10 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
 	}
 }
 
+void __init kmem_cache_init_early(void)
+{
+}
+
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
diff --git a/mm/slob.c b/mm/slob.c
index 60c5842215f1..00e323af8be4 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -715,6 +715,10 @@ struct kmem_cache kmem_cache_boot = {
 	.align = ARCH_KMALLOC_MINALIGN,
 };
 
+void __init kmem_cache_init_early(void)
+{
+}
+
 void __init kmem_cache_init(void)
 {
 	kmem_cache = &kmem_cache_boot;
diff --git a/mm/slub.c b/mm/slub.c
index a74afe59a403..40bcd18143b6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4221,9 +4221,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->remote_node_defrag_ratio = 1000;
 #endif
 
-	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
-		stack_depot_init();
-
 	/* Initialize the pre-computed randomized freelist if slab is up */
 	if (slab_state >= UP) {
 		if (init_cache_random_seq(s))
@@ -4810,6 +4807,31 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
 	return s;
 }
 
+/* Initialize stack depot if needed */
+void __init kmem_cache_init_early(void)
+{
+#ifdef CONFIG_STACKDEPOT
+	slab_flags_t block_flags;
+	char *next_block;
+	char *slab_list;
+
+	if (slub_debug & SLAB_STORE_USER)
+		goto init_stack_depot;
+
+	next_block = slub_debug_string;
+	while (next_block) {
+		next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
+		if (block_flags & SLAB_STORE_USER)
+			goto init_stack_depot;
+	}
+
+	return;
+
+init_stack_depot:
+	stack_depot_init();
+#endif
+}
+
 void __init kmem_cache_init(void)
 {
 	static __initdata struct kmem_cache boot_kmem_cache,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm/slub: initialize stack depot in boot process
  2022-02-28 15:09           ` [PATCH] mm/slub: initialize stack depot in boot process Hyeonggon Yoo
@ 2022-02-28 16:28             ` Marco Elver
  2022-03-01  2:12               ` Hyeonggon Yoo
  2022-03-01  0:28             ` Vlastimil Babka
  1 sibling, 1 reply; 46+ messages in thread
From: Marco Elver @ 2022-02-28 16:28 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Mon, Feb 28, 2022 at 03:09PM +0000, Hyeonggon Yoo wrote:
> commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> objects") initializes stack depot while creating cache if SLAB_STORE_USER
> flag is set.
> 
> This can make kernel crash because a cache can be created in various
> contexts. For example if user sets slub_debug=U, kernel crashes
> because create_boot_cache() calls stack_depot_init(), which tries to
> allocate hash table using memblock_alloc() if slab is not available.
> But memblock is also not available at that time.
> 
> This patch solves the problem by initializing stack depot early
> in boot process if SLAB_STORE_USER debug flag is set globally
> or the flag is set to at least one cache.
> 
> [ elver@google.com: initialize stack depot depending on slub_debug
>   parameter instead of allowing stack_depot_init() can be called
>   in kmem_cache_init() for simplicity. ]
> 
> Link: https://lkml.org/lkml/2022/2/28/238

This would be a better permalink:
https://lore.kernel.org/all/YhyeaP8lrzKgKm5A@ip-172-31-19-208.ap-northeast-1.compute.internal/

> Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")

This commit does not exist in -next.

I assume you intend that "lib/stackdepot: Use page allocator if both
slab and memblock is unavailable" should be dropped now.

> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
>  include/linux/slab.h |  1 +
>  init/main.c          |  1 +
>  mm/slab.c            |  4 ++++
>  mm/slob.c            |  4 ++++
>  mm/slub.c            | 28 +++++++++++++++++++++++++---
>  5 files changed, 35 insertions(+), 3 deletions(-)
[...]
>  
> +/* Initialize stack depot if needed */
> +void __init kmem_cache_init_early(void)
> +{
> +#ifdef CONFIG_STACKDEPOT
> +	slab_flags_t block_flags;
> +	char *next_block;
> +	char *slab_list;
> +
> +	if (slub_debug & SLAB_STORE_USER)
> +		goto init_stack_depot;
> +
> +	next_block = slub_debug_string;
> +	while (next_block) {
> +		next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> +		if (block_flags & SLAB_STORE_USER)
> +			goto init_stack_depot;
> +	}
> +
> +	return;
> +
> +init_stack_depot:
> +	stack_depot_init();
> +#endif
> +}

You can simplify this function to avoid the goto:

	/* Initialize stack depot if needed */
	void __init kmem_cache_init_early(void)
	{
	#ifdef CONFIG_STACKDEPOT
		slab_flags_t flags = slub_debug;
		char *next_block = slub_debug_string;
		char *slab_list;
	
		for (;;) {
			if (flags & SLAB_STORE_USER) {
				stack_depot_init();
				break;
			}
			if (!next_block)
				break;
			next_block = parse_slub_debug_flags(next_block, &flags, &slab_list, false);
		}
	#endif
	}

^^ with this version, it'd also be much easier and less confusing to add
other initialization logic unrelated to stackdepot later after the loop
(should it ever be required).


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-02-26 10:24   ` Hyeonggon Yoo
@ 2022-02-28 18:44     ` Vlastimil Babka
  0 siblings, 0 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-28 18:44 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Imran Khan

On 2/26/22 11:24, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
>> From: Oliver Glitta <glittao@gmail.com>
>> 
>> Many stack traces are similar so there are many similar arrays.
>> Stackdepot saves each unique stack only once.
>> 
>> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
>> stackdepot to save stack trace.
>> 
>> The benefits are smaller memory overhead and possibility to aggregate
>> per-cache statistics in the following patch using the stackdepot handle
>> instead of matching stacks manually.
>> 
>> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
>> 
>> This was initially merged as commit 788691464c29 and reverted by commit
>> ae14c63a9f20 due to several issues, that should now be fixed.
>> The problem of unconditional memory overhead by stackdepot has been
>> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
>> and stack_table allocation by kvmalloc()"), so the dependency on
>> stackdepot will result in extra memory usage only when a slab cache
>> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
>> The build failures on some architectures were also addressed, and the
>> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
>> patch.
>> 
>> Signed-off-by: Oliver Glitta <glittao@gmail.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Pekka Enberg <penberg@kernel.org>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>>  init/Kconfig |  1 +
>>  mm/slub.c    | 88 +++++++++++++++++++++++++++++-----------------------
>>  2 files changed, 50 insertions(+), 39 deletions(-)
>> 
>> diff --git a/init/Kconfig b/init/Kconfig
>> index e9119bf54b1f..b21dd3a4a106 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1871,6 +1871,7 @@ config SLUB_DEBUG
>>  	default y
>>  	bool "Enable SLUB debugging support" if EXPERT
>>  	depends on SLUB && SYSFS
>> +	select STACKDEPOT if STACKTRACE_SUPPORT
>>  	help
>>  	  SLUB has extensive debug support features. Disabling these can
>>  	  result in significant savings in code size. This also disables
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1fc451f4fe62..3140f763e819 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/cpuset.h>
>>  #include <linux/mempolicy.h>
>>  #include <linux/ctype.h>
>> +#include <linux/stackdepot.h>
>>  #include <linux/debugobjects.h>
>>  #include <linux/kallsyms.h>
>>  #include <linux/kfence.h>
>> @@ -264,8 +265,8 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
>>  #define TRACK_ADDRS_COUNT 16
>>  struct track {
>>  	unsigned long addr;	/* Called from address */
>> -#ifdef CONFIG_STACKTRACE
>> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
>> +#ifdef CONFIG_STACKDEPOT
>> +	depot_stack_handle_t handle;
>>  #endif
>>  	int cpu;		/* Was running on cpu */
>>  	int pid;		/* Pid context */
>> @@ -724,22 +725,20 @@ static struct track *get_track(struct kmem_cache *s, void *object,
>>  	return kasan_reset_tag(p + alloc);
>>  }
>>  
>> -static void set_track(struct kmem_cache *s, void *object,
>> -			enum track_item alloc, unsigned long addr)
>> +static noinline void
>> +set_track(struct kmem_cache *s, void *object, enum track_item alloc,
>> +	  unsigned long addr, gfp_t flags)
>>  {
>>  	struct track *p = get_track(s, object, alloc);
>>  
>> -#ifdef CONFIG_STACKTRACE
>> +#ifdef CONFIG_STACKDEPOT
>> +	unsigned long entries[TRACK_ADDRS_COUNT];
>>  	unsigned int nr_entries;
>>  
>> -	metadata_access_enable();
>> -	nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
>> -				      TRACK_ADDRS_COUNT, 3);
>> -	metadata_access_disable();
>> -
>> -	if (nr_entries < TRACK_ADDRS_COUNT)
>> -		p->addrs[nr_entries] = 0;
>> +	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3);
>> +	p->handle = stack_depot_save(entries, nr_entries, flags);
>>  #endif
>> +
>>  	p->addr = addr;
>>  	p->cpu = smp_processor_id();
>>  	p->pid = current->pid;
>> @@ -759,20 +758,19 @@ static void init_tracking(struct kmem_cache *s, void *object)
>>  
>>  static void print_track(const char *s, struct track *t, unsigned long pr_time)
>>  {
>> +	depot_stack_handle_t handle __maybe_unused;
>> +
>>  	if (!t->addr)
>>  		return;
>>  
>>  	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
>>  	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
>> -#ifdef CONFIG_STACKTRACE
>> -	{
>> -		int i;
>> -		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
>> -			if (t->addrs[i])
>> -				pr_err("\t%pS\n", (void *)t->addrs[i]);
>> -			else
>> -				break;
>> -	}
>> +#ifdef CONFIG_STACKDEPOT
>> +	handle = READ_ONCE(t->handle);
>> +	if (handle)
>> +		stack_depot_print(handle);
>> +	else
>> +		pr_err("object allocation/free stack trace missing\n");
>>  #endif
>>  }
>>  
>> @@ -1304,9 +1302,9 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
>>  	return 1;
>>  }
>>  
>> -static noinline int alloc_debug_processing(struct kmem_cache *s,
>> -					struct slab *slab,
>> -					void *object, unsigned long addr)
>> +static noinline int
>> +alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object,
>> +		       unsigned long addr, gfp_t flags)
>>  {
>>  	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
>>  		if (!alloc_consistency_checks(s, slab, object))
>> @@ -1315,7 +1313,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s,
>>  
>>  	/* Success perform special debug activities for allocs */
>>  	if (s->flags & SLAB_STORE_USER)
>> -		set_track(s, object, TRACK_ALLOC, addr);
>> +		set_track(s, object, TRACK_ALLOC, addr, flags);
> 
> I see warning because of this.
> We should not reuse flags here because alloc_debug_processing() can be
> called with preemption disabled, and caller specified GFP_KERNEL.

Ugh, thanks for catching this, looks like I forgot to test with necessary
config options. Indeed the previous version of this patch that was commit
788691464c29 used GFP_NOWAIT. I have used the idea to pass allocation
gfpflags from Imran's version (another Cc I forgot, sorry)
https://lore.kernel.org/all/20210831062539.898293-3-imran.f.khan@oracle.com/
...

> [    2.015902] BUG: sleeping function called from invalid context at mm/page_alloc.c:5164
> [    2.022052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> [    2.028357] preempt_count: 1, expected: 0
> [    2.031508] RCU nest depth: 0, expected: 0
> [    2.034722] 1 lock held by swapper/0/1:
> [    2.037905]  #0: ffff00000488f4d0 (&sb->s_type->i_mutex_key#5){+.+.}-{4:4}, at: start_creating+0x58/0x130
> [    2.045393] Preemption disabled at:
> [    2.045400] [<ffff8000083bd008>] __slab_alloc.constprop.0+0x38/0xc0

... but indeed __slab_alloc() disables preemption so that won't work, and we
can only safely use GFP_NOWAIT. Will fix in v2, thanks.

> [    2.053039] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.17.0-rc5+ #105
> [    2.059365] Hardware name: linux,dummy-virt (DT)
> [    2.063160] Call trace:
> [    2.065217]  dump_backtrace+0xf8/0x130
> [    2.068350]  show_stack+0x24/0x80
> [    2.071104]  dump_stack_lvl+0x9c/0xd8
> [    2.074140]  dump_stack+0x18/0x34
> [    2.076894]  __might_resched+0x1a0/0x280
> [    2.080146]  __might_sleep+0x58/0x90
> [    2.083108]  prepare_alloc_pages.constprop.0+0x1b4/0x1f0
> [    2.087468]  __alloc_pages+0x88/0x1e0
> [    2.090502]  alloc_page_interleave+0x24/0xb4
> [    2.094021]  alloc_pages+0x10c/0x170
> [    2.096984]  __stack_depot_save+0x3e0/0x4e0
> [    2.100446]  stack_depot_save+0x14/0x20
> [    2.103617]  set_track.isra.0+0x64/0xa4
> [    2.106787]  alloc_debug_processing+0x11c/0x1e0
> [    2.110532]  ___slab_alloc+0x3e8/0x750
> [    2.113643]  __slab_alloc.constprop.0+0x64/0xc0
> [    2.117391]  kmem_cache_alloc+0x304/0x350
> [    2.120702]  security_inode_alloc+0x38/0xa4
> [    2.124169]  inode_init_always+0xd0/0x264
> [    2.127501]  alloc_inode+0x44/0xec
> [    2.130325]  new_inode+0x28/0xc0
> [    2.133011]  tracefs_create_file+0x74/0x1e0
> [    2.136459]  init_tracer_tracefs+0x248/0x644
> [    2.140030]  tracer_init_tracefs+0x9c/0x34c
> [    2.143483]  do_one_initcall+0x44/0x170
> [    2.146654]  do_initcalls+0x104/0x144
> [    2.149704]  kernel_init_freeable+0x130/0x178
> 
> [...]
> 
>>  	trace(s, slab, object, 1);
>>  	init_object(s, object, SLUB_RED_ACTIVE);
>>  	return 1;
>> @@ -1395,7 +1393,7 @@ static noinline int free_debug_processing(
>>  	}
>>  
>>  	if (s->flags & SLAB_STORE_USER)
>> -		set_track(s, object, TRACK_FREE, addr);
>> +		set_track(s, object, TRACK_FREE, addr, GFP_NOWAIT);
>>  	trace(s, slab, object, 0);
>>  	/* Freepointer not overwritten by init_object(), SLAB_POISON moved it */
>>  	init_object(s, object, SLUB_RED_INACTIVE);
>> @@ -1632,7 +1630,8 @@ static inline
>>  void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>>  
>>  static inline int alloc_debug_processing(struct kmem_cache *s,
>> -	struct slab *slab, void *object, unsigned long addr) { return 0; }
>> +	struct slab *slab, void *object, unsigned long addr,
>> +	gfp_t flags) { return 0; }
>>  
>>  static inline int free_debug_processing(
>>  	struct kmem_cache *s, struct slab *slab,
>> @@ -3033,7 +3032,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  check_new_slab:
>>  
>>  	if (kmem_cache_debug(s)) {
>> -		if (!alloc_debug_processing(s, slab, freelist, addr)) {
>> +		if (!alloc_debug_processing(s, slab, freelist, addr, gfpflags)) {
>>  			/* Slab failed checks. Next slab needed */
>>  			goto new_slab;
>>  		} else {
>> @@ -4221,6 +4220,9 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>>  	s->remote_node_defrag_ratio = 1000;
>>  #endif
>>  
>> +	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
>> +		stack_depot_init();
>> +
> 
> As mentioned in my report, it can crash system when creating boot caches
> with debugging enabled.
> 
> The rest looks fine!
> 
>>  	/* Initialize the pre-computed randomized freelist if slab is up */
>>  	if (slab_state >= UP) {
>>  		if (init_cache_random_seq(s))
>> @@ -4352,18 +4354,26 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>>  	objp = fixup_red_left(s, objp);
>>  	trackp = get_track(s, objp, TRACK_ALLOC);
>>  	kpp->kp_ret = (void *)trackp->addr;
>> -#ifdef CONFIG_STACKTRACE
>> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> -		kpp->kp_stack[i] = (void *)trackp->addrs[i];
>> -		if (!kpp->kp_stack[i])
>> -			break;
>> -	}
>> +#ifdef CONFIG_STACKDEPOT
>> +	{
>> +		depot_stack_handle_t handle;
>> +		unsigned long *entries;
>> +		unsigned int nr_entries;
>> +
>> +		handle = READ_ONCE(trackp->handle);
>> +		if (handle) {
>> +			nr_entries = stack_depot_fetch(handle, &entries);
>> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> +				kpp->kp_stack[i] = (void *)entries[i];
>> +		}
>>  
>> -	trackp = get_track(s, objp, TRACK_FREE);
>> -	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
>> -		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
>> -		if (!kpp->kp_free_stack[i])
>> -			break;
>> +		trackp = get_track(s, objp, TRACK_FREE);
>> +		handle = READ_ONCE(trackp->handle);
>> +		if (handle) {
>> +			nr_entries = stack_depot_fetch(handle, &entries);
>> +			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
>> +				kpp->kp_free_stack[i] = (void *)entries[i];
>> +		}
>>  	}
>>  #endif
>>  #endif
>> -- 
>> 2.35.1
>> 
>> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-26  7:19 ` [PATCH 0/5] SLUB debugfs improvements based on stackdepot Hyeonggon Yoo
@ 2022-02-28 19:10   ` Vlastimil Babka
  2022-02-28 20:01     ` Mike Rapoport
  2022-02-28 21:27     ` Hyeonggon Yoo
  0 siblings, 2 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-28 19:10 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap,
	Marco Elver, Karolina Drobnik, Mike Rapoport

On 2/26/22 08:19, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>> 
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>> 
> 
> Hello. I just started review/testing this series.
> 
> it crashed on my system (arm64)

Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
from memblock. arm64 must have memblock freeing happen earlier or something.
(CCing memblock experts)

> I ran with boot parameter slub_debug=U, and without KASAN.
> So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> 
> void * __init memblock_alloc_try_nid(
>                         phys_addr_t size, phys_addr_t align,
>                         phys_addr_t min_addr, phys_addr_t max_addr,
>                         int nid)
> {
>         void *ptr;
> 
>         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>                      __func__, (u64)size, (u64)align, nid, &min_addr,
>                      &max_addr, (void *)_RET_IP_);
>         ptr = memblock_alloc_internal(size, align,
>                                            min_addr, max_addr, nid, false);
>         if (ptr)
>                 memset(ptr, 0, size); <--- Crash Here
> 
>         return ptr;
> }
> 
> It crashed during create_boot_cache() -> stack_depot_init() ->
> memblock_alloc().
> 
> I think That's because, in kmem_cache_init(), both slab and memblock is not
> available. (AFAIU memblock is not available after mem_init() because of
> memblock_free_all(), right?)

Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
But then, I would expect stack_depot_init() to detect that memblock_alloc()
returns NULL, we print ""Stack Depot hash table allocation failed,
disabling" and disable it. Instead it seems memblock_alloc() returns
something that's already potentially used by somebody else? Sounds like a bug?

> Thanks!
> 
> /*
>  * Set up kernel memory allocators
>  */
> static void __init mm_init(void)
> {
>         /*
>          * page_ext requires contiguous pages,
>          * bigger than MAX_ORDER unless SPARSEMEM.
>          */
>         page_ext_init_flatmem();
>         init_mem_debugging_and_hardening();
>         kfence_alloc_pool();
>         report_meminit();
>         stack_depot_early_init();
>         mem_init();
>         mem_init_print_info();
>         kmem_cache_init();
>         /*
>          * page_owner must be initialized after buddy is ready, and also after
>          * slab is ready so that stack_depot_init() works properly
>          */)
> 
>> Patch 1 is a new preparatory cleanup.
>> 
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>> 
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>> 
>> Also available in git, based on v5.17-rc1:
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> 
>> I'd like to ask for some review before I add this to the slab tree.
>> 
>> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> 
>> Oliver Glitta (4):
>>   mm/slub: use stackdepot to save stack trace in objects
>>   mm/slub: aggregate and print stack traces in debugfs files
>>   mm/slub: sort debugfs output by frequency of stack traces
>>   slab, documentation: add description of debugfs files for SLUB caches
>> 
>> Vlastimil Babka (1):
>>   mm/slub: move struct track init out of set_track()
>> 
>>  Documentation/vm/slub.rst |  61 +++++++++++++++
>>  init/Kconfig              |   1 +
>>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>>  3 files changed, 162 insertions(+), 52 deletions(-)
>> 
>> -- 
>> 2.35.1
>> 
>> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 19:10   ` Vlastimil Babka
@ 2022-02-28 20:01     ` Mike Rapoport
  2022-02-28 21:20       ` Hyeonggon Yoo
  2022-02-28 23:38       ` Vlastimil Babka
  2022-02-28 21:27     ` Hyeonggon Yoo
  1 sibling, 2 replies; 46+ messages in thread
From: Mike Rapoport @ 2022-02-28 20:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >> 
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >> 
> > 
> > Hello. I just started review/testing this series.
> > 
> > it crashed on my system (arm64)
> 
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
> 
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > 
> > void * __init memblock_alloc_try_nid(
> >                         phys_addr_t size, phys_addr_t align,
> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >                         int nid)
> > {
> >         void *ptr;
> > 
> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >                      &max_addr, (void *)_RET_IP_);
> >         ptr = memblock_alloc_internal(size, align,
> >                                            min_addr, max_addr, nid, false);
> >         if (ptr)
> >                 memset(ptr, 0, size); <--- Crash Here
> > 
> >         return ptr;
> > }
> > 
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> > 
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
> 
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?

If stack_depot_init() is called from kmem_cache_init(), there will be a
confusion what allocator should be used because we use slab_is_available()
to stop using memblock and start using kmalloc() instead in both
stack_depot_init() and in memblock.

Hyeonggon, did you run your tests with panic on warn at any chance?
 
> > Thanks!
> > 
> > /*
> >  * Set up kernel memory allocators
> >  */
> > static void __init mm_init(void)
> > {
> >         /*
> >          * page_ext requires contiguous pages,
> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >          */
> >         page_ext_init_flatmem();
> >         init_mem_debugging_and_hardening();
> >         kfence_alloc_pool();
> >         report_meminit();
> >         stack_depot_early_init();
> >         mem_init();
> >         mem_init_print_info();
> >         kmem_cache_init();
> >         /*
> >          * page_owner must be initialized after buddy is ready, and also after
> >          * slab is ready so that stack_depot_init() works properly
> >          */)
> > 
> >> Patch 1 is a new preparatory cleanup.
> >> 
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >> 
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >> 
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> 
> >> I'd like to ask for some review before I add this to the slab tree.
> >> 
> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> 
> >> Oliver Glitta (4):
> >>   mm/slub: use stackdepot to save stack trace in objects
> >>   mm/slub: aggregate and print stack traces in debugfs files
> >>   mm/slub: sort debugfs output by frequency of stack traces
> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> 
> >> Vlastimil Babka (1):
> >>   mm/slub: move struct track init out of set_track()
> >> 
> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >>  init/Kconfig              |   1 +
> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> 
> >> -- 
> >> 2.35.1
> >> 
> >> 
> > 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 20:01     ` Mike Rapoport
@ 2022-02-28 21:20       ` Hyeonggon Yoo
  2022-02-28 23:38       ` Vlastimil Babka
  1 sibling, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-28 21:20 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Mon, Feb 28, 2022 at 10:01:29PM +0200, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 

It's really weird, but memblock_alloc() did not fail after
memblock_free_all(). it just crashed while initializing memory returned
by memblock.

> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.
> 
> Hyeonggon, did you run your tests with panic on warn at any chance?
>

Yeah, I think this stack trace would help:

[    0.000000] Stack Depot allocating hash table with memblock_alloc
[    0.000000] Unable to handle kernel paging request at virtual address ffff000097400000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000047
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x07: level 3 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000047
[    0.000000]   CM = 0, WnR = 1
[    0.000000] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041719000
[    0.000000] [ffff000097400000] pgd=18000000dcff8003, p4d=18000000dcff8003, pud=18000000dcbfe003, pmd=18000000dcb43003, pte=00680000d7400706
[    0.000000] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-11918-gbf5d03166d75 #51
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : __memset+0x16c/0x188
[    0.000000] lr : memblock_alloc_try_nid+0xcc/0xe4
[    0.000000] sp : ffff800009a33cd0
[    0.000000] x29: ffff800009a33cd0 x28: 0000000041720018 x27: ffff800009362640
[    0.000000] x26: ffff800009362640 x25: 0000000000000000 x24: 0000000000000000
[    0.000000] x23: 0000000000002000 x22: ffff80000932bb50 x21: 00000000ffffffff
[    0.000000] x20: ffff000097400000 x19: 0000000000800000 x18: ffffffffffffffff
[    0.000000] x17: 373578302f383278 x16: 302b657461657263 x15: 0000001000000000
[    0.000000] x14: 0000000000000360 x13: 0000000000009f8c x12: 00000000dcb0c070
[    0.000000] x11: 0000001000000000 x10: 00000000004ea000 x9 : 0000000000000000
[    0.000000] x8 : ffff000097400000 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[    0.000000] x2 : 00000000007fffc0 x1 : 0000000000000000 x0 : ffff000097400000
[    0.000000] Call trace:
[    0.000000]  __memset+0x16c/0x188
[    0.000000]  stack_depot_init+0xc8/0x100
[    0.000000]  __kmem_cache_create+0x454/0x570
[    0.000000]  create_boot_cache+0xa0/0xe0
[    0.000000]  kmem_cache_init+0xf8/0x204
[    0.000000]  start_kernel+0x3ec/0x668
[    0.000000]  __primary_switched+0xc0/0xc8
[    0.000000] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---


Thanks!

> > > Thanks!
> > > 
> > > /*
> > >  * Set up kernel memory allocators
> > >  */
> > > static void __init mm_init(void)
> > > {
> > >         /*
> > >          * page_ext requires contiguous pages,
> > >          * bigger than MAX_ORDER unless SPARSEMEM.
> > >          */
> > >         page_ext_init_flatmem();
> > >         init_mem_debugging_and_hardening();
> > >         kfence_alloc_pool();
> > >         report_meminit();
> > >         stack_depot_early_init();
> > >         mem_init();
> > >         mem_init_print_info();
> > >         kmem_cache_init();
> > >         /*
> > >          * page_owner must be initialized after buddy is ready, and also after
> > >          * slab is ready so that stack_depot_init() works properly
> > >          */)
> > > 
> > >> Patch 1 is a new preparatory cleanup.
> > >> 
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >> 
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >> 
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >> 
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >> 
> > >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> > >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> > >> 
> > >> Oliver Glitta (4):
> > >>   mm/slub: use stackdepot to save stack trace in objects
> > >>   mm/slub: aggregate and print stack traces in debugfs files
> > >>   mm/slub: sort debugfs output by frequency of stack traces
> > >>   slab, documentation: add description of debugfs files for SLUB caches
> > >> 
> > >> Vlastimil Babka (1):
> > >>   mm/slub: move struct track init out of set_track()
> > >> 
> > >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> > >>  init/Kconfig              |   1 +
> > >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> > >>  3 files changed, 162 insertions(+), 52 deletions(-)
> > >> 
> > >> -- 
> > >> 2.35.1
> > >> 
> > >> 
> > > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 19:10   ` Vlastimil Babka
  2022-02-28 20:01     ` Mike Rapoport
@ 2022-02-28 21:27     ` Hyeonggon Yoo
  2022-03-01  9:23       ` Mike Rapoport
  2022-03-02  8:37       ` Mike Rapoport
  1 sibling, 2 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-02-28 21:27 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap,
	Marco Elver, Karolina Drobnik, Mike Rapoport

On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> Hi,
> >> 
> >> this series combines and revives patches from Oliver's last year
> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> files alloc_traces and free_traces more useful.
> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> as explained in patch 2.
> >> 
> > 
> > Hello. I just started review/testing this series.
> > 
> > it crashed on my system (arm64)
> 
> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> from memblock. arm64 must have memblock freeing happen earlier or something.
> (CCing memblock experts)
> 
> > I ran with boot parameter slub_debug=U, and without KASAN.
> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > 
> > void * __init memblock_alloc_try_nid(
> >                         phys_addr_t size, phys_addr_t align,
> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >                         int nid)
> > {
> >         void *ptr;
> > 
> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >                      &max_addr, (void *)_RET_IP_);
> >         ptr = memblock_alloc_internal(size, align,
> >                                            min_addr, max_addr, nid, false);
> >         if (ptr)
> >                 memset(ptr, 0, size); <--- Crash Here
> > 
> >         return ptr;
> > }
> > 
> > It crashed during create_boot_cache() -> stack_depot_init() ->
> > memblock_alloc().
> > 
> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > available. (AFAIU memblock is not available after mem_init() because of
> > memblock_free_all(), right?)
> 
> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> returns NULL, we print ""Stack Depot hash table allocation failed,
> disabling" and disable it. Instead it seems memblock_alloc() returns
> something that's already potentially used by somebody else? Sounds like a bug?


By the way, I fixed this by allowing stack_depot_init() to be called in
kmem_cache_init() too [1] and Marco suggested that calling
stack_depot_init() depending on slub_debug parameter for simplicity. [2]

I would prefer [2], Would you take a look?

[1] https://lkml.org/lkml/2022/2/27/31

[2] https://lkml.org/lkml/2022/2/28/717

> > Thanks!
> > 
> > /*
> >  * Set up kernel memory allocators
> >  */
> > static void __init mm_init(void)
> > {
> >         /*
> >          * page_ext requires contiguous pages,
> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >          */
> >         page_ext_init_flatmem();
> >         init_mem_debugging_and_hardening();
> >         kfence_alloc_pool();
> >         report_meminit();
> >         stack_depot_early_init();
> >         mem_init();
> >         mem_init_print_info();
> >         kmem_cache_init();
> >         /*
> >          * page_owner must be initialized after buddy is ready, and also after
> >          * slab is ready so that stack_depot_init() works properly
> >          */)
> > 
> >> Patch 1 is a new preparatory cleanup.
> >> 
> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> reverted for stackdepot related issues as explained in the patch.
> >> 
> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> be considered too intrusive so I will postpone it for later. The docs
> >> patch is adjusted accordingly.
> >> 
> >> Also available in git, based on v5.17-rc1:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> 
> >> I'd like to ask for some review before I add this to the slab tree.
> >> 
> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> 
> >> Oliver Glitta (4):
> >>   mm/slub: use stackdepot to save stack trace in objects
> >>   mm/slub: aggregate and print stack traces in debugfs files
> >>   mm/slub: sort debugfs output by frequency of stack traces
> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> 
> >> Vlastimil Babka (1):
> >>   mm/slub: move struct track init out of set_track()
> >> 
> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >>  init/Kconfig              |   1 +
> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> 
> >> -- 
> >> 2.35.1
> >> 
> >> 
> > 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 20:01     ` Mike Rapoport
  2022-02-28 21:20       ` Hyeonggon Yoo
@ 2022-02-28 23:38       ` Vlastimil Babka
  2022-03-01  9:21         ` Mike Rapoport
  1 sibling, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-02-28 23:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On 2/28/22 21:01, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> Hi,
>> >> 
>> >> this series combines and revives patches from Oliver's last year
>> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> files alloc_traces and free_traces more useful.
>> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> as explained in patch 2.
>> >> 
>> > 
>> > Hello. I just started review/testing this series.
>> > 
>> > it crashed on my system (arm64)
>> 
>> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> from memblock. arm64 must have memblock freeing happen earlier or something.
>> (CCing memblock experts)
>> 
>> > I ran with boot parameter slub_debug=U, and without KASAN.
>> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> > 
>> > void * __init memblock_alloc_try_nid(
>> >                         phys_addr_t size, phys_addr_t align,
>> >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> >                         int nid)
>> > {
>> >         void *ptr;
>> > 
>> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> >                      &max_addr, (void *)_RET_IP_);
>> >         ptr = memblock_alloc_internal(size, align,
>> >                                            min_addr, max_addr, nid, false);
>> >         if (ptr)
>> >                 memset(ptr, 0, size); <--- Crash Here
>> > 
>> >         return ptr;
>> > }
>> > 
>> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > memblock_alloc().
>> > 
>> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > available. (AFAIU memblock is not available after mem_init() because of
>> > memblock_free_all(), right?)
>> 
>> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> returns NULL, we print ""Stack Depot hash table allocation failed,
>> disabling" and disable it. Instead it seems memblock_alloc() returns
>> something that's already potentially used by somebody else? Sounds like a bug?
> 
> If stack_depot_init() is called from kmem_cache_init(), there will be a
> confusion what allocator should be used because we use slab_is_available()
> to stop using memblock and start using kmalloc() instead in both
> stack_depot_init() and in memblock.

I did check that stack_depot_init() is called from kmem_cache_init()
*before* we make slab_is_available() true, hence assumed that memblock would
be still available at that point and expected no confusion. But seems if
memblock is already beyond memblock_free_all() then it being still available
is just an illusion?

> Hyeonggon, did you run your tests with panic on warn at any chance?
>  
>> > Thanks!
>> > 
>> > /*
>> >  * Set up kernel memory allocators
>> >  */
>> > static void __init mm_init(void)
>> > {
>> >         /*
>> >          * page_ext requires contiguous pages,
>> >          * bigger than MAX_ORDER unless SPARSEMEM.
>> >          */
>> >         page_ext_init_flatmem();
>> >         init_mem_debugging_and_hardening();
>> >         kfence_alloc_pool();
>> >         report_meminit();
>> >         stack_depot_early_init();
>> >         mem_init();
>> >         mem_init_print_info();
>> >         kmem_cache_init();
>> >         /*
>> >          * page_owner must be initialized after buddy is ready, and also after
>> >          * slab is ready so that stack_depot_init() works properly
>> >          */)
>> > 
>> >> Patch 1 is a new preparatory cleanup.
>> >> 
>> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> reverted for stackdepot related issues as explained in the patch.
>> >> 
>> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> be considered too intrusive so I will postpone it for later. The docs
>> >> patch is adjusted accordingly.
>> >> 
>> >> Also available in git, based on v5.17-rc1:
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> 
>> >> I'd like to ask for some review before I add this to the slab tree.
>> >> 
>> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> >> 
>> >> Oliver Glitta (4):
>> >>   mm/slub: use stackdepot to save stack trace in objects
>> >>   mm/slub: aggregate and print stack traces in debugfs files
>> >>   mm/slub: sort debugfs output by frequency of stack traces
>> >>   slab, documentation: add description of debugfs files for SLUB caches
>> >> 
>> >> Vlastimil Babka (1):
>> >>   mm/slub: move struct track init out of set_track()
>> >> 
>> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
>> >>  init/Kconfig              |   1 +
>> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>> >>  3 files changed, 162 insertions(+), 52 deletions(-)
>> >> 
>> >> -- 
>> >> 2.35.1
>> >> 
>> >> 
>> > 
>> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm/slub: initialize stack depot in boot process
  2022-02-28 15:09           ` [PATCH] mm/slub: initialize stack depot in boot process Hyeonggon Yoo
  2022-02-28 16:28             ` Marco Elver
@ 2022-03-01  0:28             ` Vlastimil Babka
  1 sibling, 0 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-01  0:28 UTC (permalink / raw)
  To: Hyeonggon Yoo, Marco Elver
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov, Eric Dumazet,
	Jarkko Sakkinen, Johannes Berg, Yury Norov, Arnd Bergmann,
	James Bottomley, Matteo Croce, Andrey Konovalov, Imran Khan,
	Zqiang

On 2/28/22 16:09, Hyeonggon Yoo wrote:
> commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> objects") initializes stack depot while creating cache if SLAB_STORE_USER
> flag is set.
> 
> This can make kernel crash because a cache can be created in various
> contexts. For example if user sets slub_debug=U, kernel crashes
> because create_boot_cache() calls stack_depot_init(), which tries to
> allocate hash table using memblock_alloc() if slab is not available.
> But memblock is also not available at that time.
> 
> This patch solves the problem by initializing stack depot early
> in boot process if SLAB_STORE_USER debug flag is set globally
> or the flag is set to at least one cache.
> 
> [ elver@google.com: initialize stack depot depending on slub_debug
>   parameter instead of allowing stack_depot_init() can be called
>   in kmem_cache_init() for simplicity. ]
> 
> Link: https://lkml.org/lkml/2022/2/28/238
> Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

I think a much easier approach would be to do this checking in
setup_slub_debug(). There we may either detect SLAB_STORE_USER in
global_flags, or check the flags returned by parse_slub_debug_flags() in the
while (str) cycle, in the 'else' case where slab_list is present. Both cases
would just set some variable that stack_depot_early_init() (the
!CONFIG_STACKDEPOT_ALWAYS_INIT version, or a newly consolidated one) would
check. So that would be another way to request the stack_depot_init() at a
well-defined point of boot, similar to CONFIG_STACKDEPOT_ALWAYS_INIT.
Because setup_slub_debug() is called by __setup, which is processed from
start_kernel() -> parse_args() before mm_init() -> stack_depot_early_init().

> ---
>  include/linux/slab.h |  1 +
>  init/main.c          |  1 +
>  mm/slab.c            |  4 ++++
>  mm/slob.c            |  4 ++++
>  mm/slub.c            | 28 +++++++++++++++++++++++++---
>  5 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 37bde99b74af..023f3f71ae35 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -139,6 +139,7 @@ struct mem_cgroup;
>  /*
>   * struct kmem_cache related prototypes
>   */
> +void __init kmem_cache_init_early(void);
>  void __init kmem_cache_init(void);
>  bool slab_is_available(void);
>  
> diff --git a/init/main.c b/init/main.c
> index 65fa2e41a9c0..4fdb7975a085 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -835,6 +835,7 @@ static void __init mm_init(void)
>  	kfence_alloc_pool();
>  	report_meminit();
>  	stack_depot_early_init();
> +	kmem_cache_init_early();
>  	mem_init();
>  	mem_init_print_info();
>  	kmem_cache_init();
> diff --git a/mm/slab.c b/mm/slab.c
> index ddf5737c63d9..80a6d01aab06 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1196,6 +1196,10 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>  	}
>  }
>  
> +void __init kmem_cache_init_early(void)
> +{
> +}
> +
>  /*
>   * Initialisation.  Called after the page allocator have been initialised and
>   * before smp_init().
> diff --git a/mm/slob.c b/mm/slob.c
> index 60c5842215f1..00e323af8be4 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -715,6 +715,10 @@ struct kmem_cache kmem_cache_boot = {
>  	.align = ARCH_KMALLOC_MINALIGN,
>  };
>  
> +void __init kmem_cache_init_early(void)
> +{
> +}
> +
>  void __init kmem_cache_init(void)
>  {
>  	kmem_cache = &kmem_cache_boot;
> diff --git a/mm/slub.c b/mm/slub.c
> index a74afe59a403..40bcd18143b6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4221,9 +4221,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>  	s->remote_node_defrag_ratio = 1000;
>  #endif
>  
> -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> -		stack_depot_init();
> -
>  	/* Initialize the pre-computed randomized freelist if slab is up */
>  	if (slab_state >= UP) {
>  		if (init_cache_random_seq(s))
> @@ -4810,6 +4807,31 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
>  	return s;
>  }
>  
> +/* Initialize stack depot if needed */
> +void __init kmem_cache_init_early(void)
> +{
> +#ifdef CONFIG_STACKDEPOT
> +	slab_flags_t block_flags;
> +	char *next_block;
> +	char *slab_list;
> +
> +	if (slub_debug & SLAB_STORE_USER)
> +		goto init_stack_depot;
> +
> +	next_block = slub_debug_string;
> +	while (next_block) {
> +		next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> +		if (block_flags & SLAB_STORE_USER)
> +			goto init_stack_depot;
> +	}
> +
> +	return;
> +
> +init_stack_depot:
> +	stack_depot_init();
> +#endif
> +}
> +
>  void __init kmem_cache_init(void)
>  {
>  	static __initdata struct kmem_cache boot_kmem_cache,



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm/slub: initialize stack depot in boot process
  2022-02-28 16:28             ` Marco Elver
@ 2022-03-01  2:12               ` Hyeonggon Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-03-01  2:12 UTC (permalink / raw)
  To: Marco Elver
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Dmitry Vyukov,
	Eric Dumazet, Jarkko Sakkinen, Johannes Berg, Yury Norov,
	Arnd Bergmann, James Bottomley, Matteo Croce, Andrey Konovalov,
	Imran Khan, Zqiang

On Mon, Feb 28, 2022 at 05:28:17PM +0100, Marco Elver wrote:
> On Mon, Feb 28, 2022 at 03:09PM +0000, Hyeonggon Yoo wrote:
> > commit ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in
> > objects") initializes stack depot while creating cache if SLAB_STORE_USER
> > flag is set.
> > 
> > This can make kernel crash because a cache can be created in various
> > contexts. For example if user sets slub_debug=U, kernel crashes
> > because create_boot_cache() calls stack_depot_init(), which tries to
> > allocate hash table using memblock_alloc() if slab is not available.
> > But memblock is also not available at that time.
> > 
> > This patch solves the problem by initializing stack depot early
> > in boot process if SLAB_STORE_USER debug flag is set globally
> > or the flag is set to at least one cache.
> > 
> > [ elver@google.com: initialize stack depot depending on slub_debug
> >   parameter instead of allowing stack_depot_init() can be called
> >   in kmem_cache_init() for simplicity. ]
> > 
> > Link: https://lkml.org/lkml/2022/2/28/238
> 
> This would be a better permalink:
> https://lore.kernel.org/all/YhyeaP8lrzKgKm5A@ip-172-31-19-208.ap-northeast-1.compute.internal/
>

Agreed.

> > Fixes: ba10d4b46655 ("mm/slub: use stackdepot to save stack trace in objects")
> 
> This commit does not exist in -next.
> 

It did not land -next yet.

> I assume you intend that "lib/stackdepot: Use page allocator if both
> slab and memblock is unavailable" should be dropped now.
>

I did not intend that, but I agree the patch you mentioned
should be dropped now.

> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > ---
> >  include/linux/slab.h |  1 +
> >  init/main.c          |  1 +
> >  mm/slab.c            |  4 ++++
> >  mm/slob.c            |  4 ++++
> >  mm/slub.c            | 28 +++++++++++++++++++++++++---
> >  5 files changed, 35 insertions(+), 3 deletions(-)
> [...]
> >  
> > +/* Initialize stack depot if needed */
> > +void __init kmem_cache_init_early(void)
> > +{
> > +#ifdef CONFIG_STACKDEPOT
> > +	slab_flags_t block_flags;
> > +	char *next_block;
> > +	char *slab_list;
> > +
> > +	if (slub_debug & SLAB_STORE_USER)
> > +		goto init_stack_depot;
> > +
> > +	next_block = slub_debug_string;
> > +	while (next_block) {
> > +		next_block = parse_slub_debug_flags(next_block, &block_flags, &slab_list, false);
> > +		if (block_flags & SLAB_STORE_USER)
> > +			goto init_stack_depot;
> > +	}
> > +
> > +	return;
> > +
> > +init_stack_depot:
> > +	stack_depot_init();
> > +#endif
> > +}
> 
> You can simplify this function to avoid the goto:
> 
> 	/* Initialize stack depot if needed */
> 	void __init kmem_cache_init_early(void)
> 	{
> 	#ifdef CONFIG_STACKDEPOT
> 		slab_flags_t flags = slub_debug;
> 		char *next_block = slub_debug_string;
> 		char *slab_list;
> 	
> 		for (;;) {
> 			if (flags & SLAB_STORE_USER) {
> 				stack_depot_init();
> 				break;
> 			}
> 			if (!next_block)
> 				break;
> 			next_block = parse_slub_debug_flags(next_block, &flags, &slab_list, false);
> 		}
> 	#endif
> 	}
> 
> ^^ with this version, it'd also be much easier and less confusing to add
> other initialization logic unrelated to stackdepot later after the loop
> (should it ever be required).

Thank you for nice suggestion, but I want to try it in
setup_slub_debug() as Vlastimil said!

Thanks.

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 23:38       ` Vlastimil Babka
@ 2022-03-01  9:21         ` Mike Rapoport
  2022-03-01  9:41           ` Vlastimil Babka
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2022-03-01  9:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> On 2/28/22 21:01, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> >> Hi,
> >> >> 
> >> >> this series combines and revives patches from Oliver's last year
> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> >> files alloc_traces and free_traces more useful.
> >> >> The resubmission was blocked on stackdepot changes that are now merged,
> >> >> as explained in patch 2.
> >> >> 
> >> > 
> >> > Hello. I just started review/testing this series.
> >> > 
> >> > it crashed on my system (arm64)
> >> 
> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> from memblock. arm64 must have memblock freeing happen earlier or something.
> >> (CCing memblock experts)
> >> 
> >> > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> > 
> >> > void * __init memblock_alloc_try_nid(
> >> >                         phys_addr_t size, phys_addr_t align,
> >> >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >> >                         int nid)
> >> > {
> >> >         void *ptr;
> >> > 
> >> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >> >                      &max_addr, (void *)_RET_IP_);
> >> >         ptr = memblock_alloc_internal(size, align,
> >> >                                            min_addr, max_addr, nid, false);
> >> >         if (ptr)
> >> >                 memset(ptr, 0, size); <--- Crash Here
> >> > 
> >> >         return ptr;
> >> > }
> >> > 
> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > memblock_alloc().
> >> > 
> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > available. (AFAIU memblock is not available after mem_init() because of
> >> > memblock_free_all(), right?)
> >> 
> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> returns NULL, we print ""Stack Depot hash table allocation failed,
> >> disabling" and disable it. Instead it seems memblock_alloc() returns
> >> something that's already potentially used by somebody else? Sounds like a bug?
> > 
> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> > confusion what allocator should be used because we use slab_is_available()
> > to stop using memblock and start using kmalloc() instead in both
> > stack_depot_init() and in memblock.
> 
> I did check that stack_depot_init() is called from kmem_cache_init()
> *before* we make slab_is_available() true, hence assumed that memblock would
> be still available at that point and expected no confusion. But seems if
> memblock is already beyond memblock_free_all() then it being still available
> is just an illusion?

Yeah, it appears it is an illusion :)

I think we have to deal with allocations that happen between
memblock_free_all() and slab_is_available() at the memblock level and then
figure out the where to put stack_depot_init() and how to allocate memory
there.

I believe something like this (untested) patch below addresses the first
issue. As for stack_depot_init() I'm still trying to figure out the
possible call paths, but it seems we can use stack_depot_early_init() for
SLUB debugging case. I'll try to come up with something Really Soon (tm).

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..4ea89d44d22a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -90,6 +90,7 @@ struct memblock_type {
  */
 struct memblock {
 	bool bottom_up;  /* is bottom up direction? */
+	bool mem_freed;
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
diff --git a/mm/memblock.c b/mm/memblock.c
index b12a364f2766..60196dc4980e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
 	.reserved.name		= "reserved",
 
 	.bottom_up		= false,
+	.mem_freed		= false,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, nid);
 
+	if (memblock.mem_freed) {
+		unsigned int order = get_order(size);
+
+		pr_warn("memblock: allocating from buddy\n");
+		return __alloc_pages_node(nid, order, GFP_KERNEL);
+	}
+
 	if (max_addr > memblock.current_limit)
 		max_addr = memblock.current_limit;
 
@@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
 
 	pages = free_low_memory_core_early();
 	totalram_pages_add(pages);
+	memblock.mem_freed = true;
 }
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
 
> > Hyeonggon, did you run your tests with panic on warn at any chance?
> >  
> >> > Thanks!
> >> > 
> >> > /*
> >> >  * Set up kernel memory allocators
> >> >  */
> >> > static void __init mm_init(void)
> >> > {
> >> >         /*
> >> >          * page_ext requires contiguous pages,
> >> >          * bigger than MAX_ORDER unless SPARSEMEM.
> >> >          */
> >> >         page_ext_init_flatmem();
> >> >         init_mem_debugging_and_hardening();
> >> >         kfence_alloc_pool();
> >> >         report_meminit();
> >> >         stack_depot_early_init();
> >> >         mem_init();
> >> >         mem_init_print_info();
> >> >         kmem_cache_init();
> >> >         /*
> >> >          * page_owner must be initialized after buddy is ready, and also after
> >> >          * slab is ready so that stack_depot_init() works properly
> >> >          */)
> >> > 
> >> >> Patch 1 is a new preparatory cleanup.
> >> >> 
> >> >> Patch 2 originally submitted here [1], was merged to mainline but
> >> >> reverted for stackdepot related issues as explained in the patch.
> >> >> 
> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> >> >> be considered too intrusive so I will postpone it for later. The docs
> >> >> patch is adjusted accordingly.
> >> >> 
> >> >> Also available in git, based on v5.17-rc1:
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> >> >> 
> >> >> I'd like to ask for some review before I add this to the slab tree.
> >> >> 
> >> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> >> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> >> >> 
> >> >> Oliver Glitta (4):
> >> >>   mm/slub: use stackdepot to save stack trace in objects
> >> >>   mm/slub: aggregate and print stack traces in debugfs files
> >> >>   mm/slub: sort debugfs output by frequency of stack traces
> >> >>   slab, documentation: add description of debugfs files for SLUB caches
> >> >> 
> >> >> Vlastimil Babka (1):
> >> >>   mm/slub: move struct track init out of set_track()
> >> >> 
> >> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> >> >>  init/Kconfig              |   1 +
> >> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> >> >>  3 files changed, 162 insertions(+), 52 deletions(-)
> >> >> 
> >> >> -- 
> >> >> 2.35.1
> >> >> 
> >> >> 
> >> > 
> >> 
> > 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 21:27     ` Hyeonggon Yoo
@ 2022-03-01  9:23       ` Mike Rapoport
  2022-03-02  8:37       ` Mike Rapoport
  1 sibling, 0 replies; 46+ messages in thread
From: Mike Rapoport @ 2022-03-01  9:23 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

Hi,

On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 
> 
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> 
> I would prefer [2], Would you take a look?
> 
> [1] https://lkml.org/lkml/2022/2/27/31
> 
> [2] https://lkml.org/lkml/2022/2/28/717

I'm still looking at stack_depot_init() callers, but I think it's possible
to make stack_depot_early_init() a proper function that calls
memblock_alloc() and only use stack_depot_init() when slab is actually
available.
 
> > > Thanks!
> > > 
> > > /*
> > >  * Set up kernel memory allocators
> > >  */
> > > static void __init mm_init(void)
> > > {
> > >         /*
> > >          * page_ext requires contiguous pages,
> > >          * bigger than MAX_ORDER unless SPARSEMEM.
> > >          */
> > >         page_ext_init_flatmem();
> > >         init_mem_debugging_and_hardening();
> > >         kfence_alloc_pool();
> > >         report_meminit();
> > >         stack_depot_early_init();
> > >         mem_init();
> > >         mem_init_print_info();
> > >         kmem_cache_init();
> > >         /*
> > >          * page_owner must be initialized after buddy is ready, and also after
> > >          * slab is ready so that stack_depot_init() works properly
> > >          */)
> > > 
> > >> Patch 1 is a new preparatory cleanup.
> > >> 
> > >> Patch 2 originally submitted here [1], was merged to mainline but
> > >> reverted for stackdepot related issues as explained in the patch.
> > >> 
> > >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
> > >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
> > >> be considered too intrusive so I will postpone it for later. The docs
> > >> patch is adjusted accordingly.
> > >> 
> > >> Also available in git, based on v5.17-rc1:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
> > >> 
> > >> I'd like to ask for some review before I add this to the slab tree.
> > >> 
> > >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
> > >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
> > >> 
> > >> Oliver Glitta (4):
> > >>   mm/slub: use stackdepot to save stack trace in objects
> > >>   mm/slub: aggregate and print stack traces in debugfs files
> > >>   mm/slub: sort debugfs output by frequency of stack traces
> > >>   slab, documentation: add description of debugfs files for SLUB caches
> > >> 
> > >> Vlastimil Babka (1):
> > >>   mm/slub: move struct track init out of set_track()
> > >> 
> > >>  Documentation/vm/slub.rst |  61 +++++++++++++++
> > >>  init/Kconfig              |   1 +
> > >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
> > >>  3 files changed, 162 insertions(+), 52 deletions(-)
> > >> 
> > >> -- 
> > >> 2.35.1
> > >> 
> > >> 
> > > 
> > 
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-01  9:21         ` Mike Rapoport
@ 2022-03-01  9:41           ` Vlastimil Babka
  2022-03-01 14:52             ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-01  9:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On 3/1/22 10:21, Mike Rapoport wrote:
> On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
>> On 2/28/22 21:01, Mike Rapoport wrote:
>> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> >> On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> >> > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> this series combines and revives patches from Oliver's last year
>> >> >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> >> >> files alloc_traces and free_traces more useful.
>> >> >> The resubmission was blocked on stackdepot changes that are now merged,
>> >> >> as explained in patch 2.
>> >> >> 
>> >> > 
>> >> > Hello. I just started review/testing this series.
>> >> > 
>> >> > it crashed on my system (arm64)
>> >> 
>> >> Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> >> from memblock. arm64 must have memblock freeing happen earlier or something.
>> >> (CCing memblock experts)
>> >> 
>> >> > I ran with boot parameter slub_debug=U, and without KASAN.
>> >> > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> >> > 
>> >> > void * __init memblock_alloc_try_nid(
>> >> >                         phys_addr_t size, phys_addr_t align,
>> >> >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> >> >                         int nid)
>> >> > {
>> >> >         void *ptr;
>> >> > 
>> >> >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> >> >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> >> >                      &max_addr, (void *)_RET_IP_);
>> >> >         ptr = memblock_alloc_internal(size, align,
>> >> >                                            min_addr, max_addr, nid, false);
>> >> >         if (ptr)
>> >> >                 memset(ptr, 0, size); <--- Crash Here
>> >> > 
>> >> >         return ptr;
>> >> > }
>> >> > 
>> >> > It crashed during create_boot_cache() -> stack_depot_init() ->
>> >> > memblock_alloc().
>> >> > 
>> >> > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> >> > available. (AFAIU memblock is not available after mem_init() because of
>> >> > memblock_free_all(), right?)
>> >> 
>> >> Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> >> But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> >> returns NULL, we print ""Stack Depot hash table allocation failed,
>> >> disabling" and disable it. Instead it seems memblock_alloc() returns
>> >> something that's already potentially used by somebody else? Sounds like a bug?
>> > 
>> > If stack_depot_init() is called from kmem_cache_init(), there will be a
>> > confusion what allocator should be used because we use slab_is_available()
>> > to stop using memblock and start using kmalloc() instead in both
>> > stack_depot_init() and in memblock.
>> 
>> I did check that stack_depot_init() is called from kmem_cache_init()
>> *before* we make slab_is_available() true, hence assumed that memblock would
>> be still available at that point and expected no confusion. But seems if
>> memblock is already beyond memblock_free_all() then it being still available
>> is just an illusion?
> 
> Yeah, it appears it is an illusion :)
> 
> I think we have to deal with allocations that happen between
> memblock_free_all() and slab_is_available() at the memblock level and then
> figure out the where to put stack_depot_init() and how to allocate memory
> there.
> 
> I believe something like this (untested) patch below addresses the first
> issue. As for stack_depot_init() I'm still trying to figure out the
> possible call paths, but it seems we can use stack_depot_early_init() for
> SLUB debugging case. I'll try to come up with something Really Soon (tm).

Yeah as you already noticed, we are pursuing an approach to decide on
calling stack_depot_early_init(), which should be a good way to solve this
given how special slab is in this case. For memblock I just wanted to point
out that it could be more robust, your patch below seems to be on the right
patch. Maybe it just doesn't have to fallback to buddy, which could be
considered a layering violation, but just return NULL that can be
immediately recognized as an error?

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 50ad19662a32..4ea89d44d22a 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -90,6 +90,7 @@ struct memblock_type {
>   */
>  struct memblock {
>  	bool bottom_up;  /* is bottom up direction? */
> +	bool mem_freed;
>  	phys_addr_t current_limit;
>  	struct memblock_type memory;
>  	struct memblock_type reserved;
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b12a364f2766..60196dc4980e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
>  	.reserved.name		= "reserved",
>  
>  	.bottom_up		= false,
> +	.mem_freed		= false,
>  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
>  };
>  
> @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
>  	if (WARN_ON_ONCE(slab_is_available()))
>  		return kzalloc_node(size, GFP_NOWAIT, nid);
>  
> +	if (memblock.mem_freed) {
> +		unsigned int order = get_order(size);
> +
> +		pr_warn("memblock: allocating from buddy\n");
> +		return __alloc_pages_node(nid, order, GFP_KERNEL);
> +	}
> +
>  	if (max_addr > memblock.current_limit)
>  		max_addr = memblock.current_limit;
>  
> @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
>  
>  	pages = free_low_memory_core_early();
>  	totalram_pages_add(pages);
> +	memblock.mem_freed = true;
>  }
>  
>  #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
>  
>> > Hyeonggon, did you run your tests with panic on warn at any chance?
>> >  
>> >> > Thanks!
>> >> > 
>> >> > /*
>> >> >  * Set up kernel memory allocators
>> >> >  */
>> >> > static void __init mm_init(void)
>> >> > {
>> >> >         /*
>> >> >          * page_ext requires contiguous pages,
>> >> >          * bigger than MAX_ORDER unless SPARSEMEM.
>> >> >          */
>> >> >         page_ext_init_flatmem();
>> >> >         init_mem_debugging_and_hardening();
>> >> >         kfence_alloc_pool();
>> >> >         report_meminit();
>> >> >         stack_depot_early_init();
>> >> >         mem_init();
>> >> >         mem_init_print_info();
>> >> >         kmem_cache_init();
>> >> >         /*
>> >> >          * page_owner must be initialized after buddy is ready, and also after
>> >> >          * slab is ready so that stack_depot_init() works properly
>> >> >          */)
>> >> > 
>> >> >> Patch 1 is a new preparatory cleanup.
>> >> >> 
>> >> >> Patch 2 originally submitted here [1], was merged to mainline but
>> >> >> reverted for stackdepot related issues as explained in the patch.
>> >> >> 
>> >> >> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> >> >> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> >> >> be considered too intrusive so I will postpone it for later. The docs
>> >> >> patch is adjusted accordingly.
>> >> >> 
>> >> >> Also available in git, based on v5.17-rc1:
>> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-stackdepot-v1
>> >> >> 
>> >> >> I'd like to ask for some review before I add this to the slab tree.
>> >> >> 
>> >> >> [1] https://lore.kernel.org/all/20210414163434.4376-1-glittao@gmail.com/
>> >> >> [2] https://lore.kernel.org/all/20210521121127.24653-1-glittao@gmail.com/
>> >> >> 
>> >> >> Oliver Glitta (4):
>> >> >>   mm/slub: use stackdepot to save stack trace in objects
>> >> >>   mm/slub: aggregate and print stack traces in debugfs files
>> >> >>   mm/slub: sort debugfs output by frequency of stack traces
>> >> >>   slab, documentation: add description of debugfs files for SLUB caches
>> >> >> 
>> >> >> Vlastimil Babka (1):
>> >> >>   mm/slub: move struct track init out of set_track()
>> >> >> 
>> >> >>  Documentation/vm/slub.rst |  61 +++++++++++++++
>> >> >>  init/Kconfig              |   1 +
>> >> >>  mm/slub.c                 | 152 +++++++++++++++++++++++++-------------
>> >> >>  3 files changed, 162 insertions(+), 52 deletions(-)
>> >> >> 
>> >> >> -- 
>> >> >> 2.35.1
>> >> >> 
>> >> >> 
>> >> > 
>> >> 
>> > 
>> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-01  9:41           ` Vlastimil Babka
@ 2022-03-01 14:52             ` Mike Rapoport
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Rapoport @ 2022-03-01 14:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Tue, Mar 01, 2022 at 10:41:10AM +0100, Vlastimil Babka wrote:
> On 3/1/22 10:21, Mike Rapoport wrote:
> > On Tue, Mar 01, 2022 at 12:38:11AM +0100, Vlastimil Babka wrote:
> >> On 2/28/22 21:01, Mike Rapoport wrote:
> >> > On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> > 
> >> > If stack_depot_init() is called from kmem_cache_init(), there will be a
> >> > confusion what allocator should be used because we use slab_is_available()
> >> > to stop using memblock and start using kmalloc() instead in both
> >> > stack_depot_init() and in memblock.
> >> 
> >> I did check that stack_depot_init() is called from kmem_cache_init()
> >> *before* we make slab_is_available() true, hence assumed that memblock would
> >> be still available at that point and expected no confusion. But seems if
> >> memblock is already beyond memblock_free_all() then it being still available
> >> is just an illusion?
> > 
> > Yeah, it appears it is an illusion :)
> > 
> > I think we have to deal with allocations that happen between
> > memblock_free_all() and slab_is_available() at the memblock level and then
> > figure out the where to put stack_depot_init() and how to allocate memory
> > there.
> > 
> > I believe something like this (untested) patch below addresses the first
> > issue. As for stack_depot_init() I'm still trying to figure out the
> > possible call paths, but it seems we can use stack_depot_early_init() for
> > SLUB debugging case. I'll try to come up with something Really Soon (tm).
> 
> Yeah as you already noticed, we are pursuing an approach to decide on
> calling stack_depot_early_init(), which should be a good way to solve this
> given how special slab is in this case. For memblock I just wanted to point
> out that it could be more robust, your patch below seems to be on the right
> patch. Maybe it just doesn't have to fallback to buddy, which could be
> considered a layering violation, but just return NULL that can be
> immediately recognized as an error?

The layering violation is anyway there for slab_is_available() case, so
adding a __alloc_pages() there will be only consistent.
 
> > diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> > index 50ad19662a32..4ea89d44d22a 100644
> > --- a/include/linux/memblock.h
> > +++ b/include/linux/memblock.h
> > @@ -90,6 +90,7 @@ struct memblock_type {
> >   */
> >  struct memblock {
> >  	bool bottom_up;  /* is bottom up direction? */
> > +	bool mem_freed;
> >  	phys_addr_t current_limit;
> >  	struct memblock_type memory;
> >  	struct memblock_type reserved;
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index b12a364f2766..60196dc4980e 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -120,6 +120,7 @@ struct memblock memblock __initdata_memblock = {
> >  	.reserved.name		= "reserved",
> >  
> >  	.bottom_up		= false,
> > +	.mem_freed		= false,
> >  	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
> >  };
> >  
> > @@ -1487,6 +1488,13 @@ static void * __init memblock_alloc_internal(
> >  	if (WARN_ON_ONCE(slab_is_available()))
> >  		return kzalloc_node(size, GFP_NOWAIT, nid);
> >  
> > +	if (memblock.mem_freed) {
> > +		unsigned int order = get_order(size);
> > +
> > +		pr_warn("memblock: allocating from buddy\n");
> > +		return __alloc_pages_node(nid, order, GFP_KERNEL);
> > +	}
> > +
> >  	if (max_addr > memblock.current_limit)
> >  		max_addr = memblock.current_limit;
> >  
> > @@ -2116,6 +2124,7 @@ void __init memblock_free_all(void)
> >  
> >  	pages = free_low_memory_core_early();
> >  	totalram_pages_add(pages);
> > +	memblock.mem_freed = true;
> >  }
> >  
> >  #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
> >  

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-28 21:27     ` Hyeonggon Yoo
  2022-03-01  9:23       ` Mike Rapoport
@ 2022-03-02  8:37       ` Mike Rapoport
  2022-03-02  9:09         ` Vlastimil Babka
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2022-03-02  8:37 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> Hi,
> > >> 
> > >> this series combines and revives patches from Oliver's last year
> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> files alloc_traces and free_traces more useful.
> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> as explained in patch 2.
> > >> 
> > > 
> > > Hello. I just started review/testing this series.
> > > 
> > > it crashed on my system (arm64)
> > 
> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > (CCing memblock experts)
> > 
> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > > 
> > > void * __init memblock_alloc_try_nid(
> > >                         phys_addr_t size, phys_addr_t align,
> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >                         int nid)
> > > {
> > >         void *ptr;
> > > 
> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >                      &max_addr, (void *)_RET_IP_);
> > >         ptr = memblock_alloc_internal(size, align,
> > >                                            min_addr, max_addr, nid, false);
> > >         if (ptr)
> > >                 memset(ptr, 0, size); <--- Crash Here
> > > 
> > >         return ptr;
> > > }
> > > 
> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > > memblock_alloc().
> > > 
> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > > available. (AFAIU memblock is not available after mem_init() because of
> > > memblock_free_all(), right?)
> > 
> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > something that's already potentially used by somebody else? Sounds like a bug?
> 
> 
> By the way, I fixed this by allowing stack_depot_init() to be called in
> kmem_cache_init() too [1] and Marco suggested that calling
> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> 
> I would prefer [2], Would you take a look?
> 
> [1] https://lkml.org/lkml/2022/2/27/31
> 
> [2] https://lkml.org/lkml/2022/2/28/717

I have the third version :)

diff --git a/mm/slub.c b/mm/slub.c
index a74afe59a403..0c3ab2335b46 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
 	}
 out:
 	slub_debug = global_flags;
+
+	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
+		stack_depot_early_init();
+
 	if (slub_debug != 0 || slub_debug_string)
 		static_branch_enable(&slub_debug_enabled);
 	else
@@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 	s->remote_node_defrag_ratio = 1000;
 #endif
 
-	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
-		stack_depot_init();
-
 	/* Initialize the pre-computed randomized freelist if slab is up */
 	if (slab_state >= UP) {
 		if (init_cache_random_seq(s))
 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)

-- 
Sincerely yours,
Mike.


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-02  8:37       ` Mike Rapoport
@ 2022-03-02  9:09         ` Vlastimil Babka
  2022-03-02 12:30           ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-02  9:09 UTC (permalink / raw)
  To: Mike Rapoport, Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap,
	Marco Elver, Karolina Drobnik

On 3/2/22 09:37, Mike Rapoport wrote:
> On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
>> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
>> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
>> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> > >> Hi,
>> > >> 
>> > >> this series combines and revives patches from Oliver's last year
>> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> > >> files alloc_traces and free_traces more useful.
>> > >> The resubmission was blocked on stackdepot changes that are now merged,
>> > >> as explained in patch 2.
>> > >> 
>> > > 
>> > > Hello. I just started review/testing this series.
>> > > 
>> > > it crashed on my system (arm64)
>> > 
>> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
>> > from memblock. arm64 must have memblock freeing happen earlier or something.
>> > (CCing memblock experts)
>> > 
>> > > I ran with boot parameter slub_debug=U, and without KASAN.
>> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
>> > > 
>> > > void * __init memblock_alloc_try_nid(
>> > >                         phys_addr_t size, phys_addr_t align,
>> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
>> > >                         int nid)
>> > > {
>> > >         void *ptr;
>> > > 
>> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
>> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
>> > >                      &max_addr, (void *)_RET_IP_);
>> > >         ptr = memblock_alloc_internal(size, align,
>> > >                                            min_addr, max_addr, nid, false);
>> > >         if (ptr)
>> > >                 memset(ptr, 0, size); <--- Crash Here
>> > > 
>> > >         return ptr;
>> > > }
>> > > 
>> > > It crashed during create_boot_cache() -> stack_depot_init() ->
>> > > memblock_alloc().
>> > > 
>> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
>> > > available. (AFAIU memblock is not available after mem_init() because of
>> > > memblock_free_all(), right?)
>> > 
>> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
>> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
>> > returns NULL, we print ""Stack Depot hash table allocation failed,
>> > disabling" and disable it. Instead it seems memblock_alloc() returns
>> > something that's already potentially used by somebody else? Sounds like a bug?
>> 
>> 
>> By the way, I fixed this by allowing stack_depot_init() to be called in
>> kmem_cache_init() too [1] and Marco suggested that calling
>> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
>> 
>> I would prefer [2], Would you take a look?
>> 
>> [1] https://lkml.org/lkml/2022/2/27/31
>> 
>> [2] https://lkml.org/lkml/2022/2/28/717
> 
> I have the third version :)

While simple, it changes the timing of stack_depot_early_init() that was
supposed to be at a single callsite - now it's less predictable and depends
on e.g. kernel parameter ordering. Some arch/config combo could break,
dunno. Setting a variable that stack_depot_early_init() checks should be
more robust.

> 
> diff --git a/mm/slub.c b/mm/slub.c
> index a74afe59a403..0c3ab2335b46 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
>  	}
>  out:
>  	slub_debug = global_flags;
> +
> +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> +		stack_depot_early_init();
> +
>  	if (slub_debug != 0 || slub_debug_string)
>  		static_branch_enable(&slub_debug_enabled);
>  	else
> @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
>  	s->remote_node_defrag_ratio = 1000;
>  #endif
>  
> -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> -		stack_depot_init();
> -
>  	/* Initialize the pre-computed randomized freelist if slab is up */
>  	if (slab_state >= UP) {
>  		if (init_cache_random_seq(s))
>  
>> -- 
>> Thank you, You are awesome!
>> Hyeonggon :-)
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-02  9:09         ` Vlastimil Babka
@ 2022-03-02 12:30           ` Mike Rapoport
  2022-03-02 17:02             ` Hyeonggon Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2022-03-02 12:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hyeonggon Yoo, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> On 3/2/22 09:37, Mike Rapoport wrote:
> > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> >> > >> Hi,
> >> > >> 
> >> > >> this series combines and revives patches from Oliver's last year
> >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> >> > >> files alloc_traces and free_traces more useful.
> >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> >> > >> as explained in patch 2.
> >> > >> 
> >> > > 
> >> > > Hello. I just started review/testing this series.
> >> > > 
> >> > > it crashed on my system (arm64)
> >> > 
> >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> >> > (CCing memblock experts)
> >> > 
> >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> >> > > 
> >> > > void * __init memblock_alloc_try_nid(
> >> > >                         phys_addr_t size, phys_addr_t align,
> >> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> >> > >                         int nid)
> >> > > {
> >> > >         void *ptr;
> >> > > 
> >> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> >> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> >> > >                      &max_addr, (void *)_RET_IP_);
> >> > >         ptr = memblock_alloc_internal(size, align,
> >> > >                                            min_addr, max_addr, nid, false);
> >> > >         if (ptr)
> >> > >                 memset(ptr, 0, size); <--- Crash Here
> >> > > 
> >> > >         return ptr;
> >> > > }
> >> > > 
> >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> >> > > memblock_alloc().
> >> > > 
> >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> >> > > available. (AFAIU memblock is not available after mem_init() because of
> >> > > memblock_free_all(), right?)
> >> > 
> >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> >> > something that's already potentially used by somebody else? Sounds like a bug?
> >> 
> >> 
> >> By the way, I fixed this by allowing stack_depot_init() to be called in
> >> kmem_cache_init() too [1] and Marco suggested that calling
> >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> >> 
> >> I would prefer [2], Would you take a look?
> >> 
> >> [1] https://lkml.org/lkml/2022/2/27/31
> >> 
> >> [2] https://lkml.org/lkml/2022/2/28/717
> > 
> > I have the third version :)
> 
> While simple, it changes the timing of stack_depot_early_init() that was
> supposed to be at a single callsite - now it's less predictable and depends
> on e.g. kernel parameter ordering. Some arch/config combo could break,
> dunno. Setting a variable that stack_depot_early_init() checks should be
> more robust.

Not sure I follow.
stack_depot_early_init() is a wrapper for stack_depot_init() which already
checks 

	if (!stack_depot_disable && !stack_table)

So largely it can be at multiple call sites just like stack_depot_init...

Still, I understand your concern of having multiple call sites for
stack_depot_early_init().

The most robust way I can think of will be to make stack_depot_early_init()
a proper function, move memblock_alloc() there and add a variable, say
stack_depot_needed_early that will be set to 1 if
CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
stack_table before kmalloc is up.
 
E.g

__init int stack_depot_early_init(void)
{

	if (stack_depot_needed_early && !stack_table) {
		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
		int i;

		pr_info("Stack Depot allocating hash table with memblock_alloc\n");
		stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
	
		if (!stack_table) {
			pr_err("Stack Depot hash table allocation failed, disabling\n");
			stack_depot_disable = true;
			return -ENOMEM;
		}
	}

	return 0;
}

The mutex is not needed here because mm_init() -> stack_depot_early_init()
happens before SMP and setting stack_table[i] to NULL is redundant with
memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).

I'm not sure if the stack depot should be disabled for good if the early
allocation failed, but that's another story.

> > diff --git a/mm/slub.c b/mm/slub.c
> > index a74afe59a403..0c3ab2335b46 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> >  	}
> >  out:
> >  	slub_debug = global_flags;
> > +
> > +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > +		stack_depot_early_init();
> > +
> >  	if (slub_debug != 0 || slub_debug_string)
> >  		static_branch_enable(&slub_debug_enabled);
> >  	else
> > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> >  	s->remote_node_defrag_ratio = 1000;
> >  #endif
> >  
> > -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > -		stack_depot_init();
> > -
> >  	/* Initialize the pre-computed randomized freelist if slab is up */
> >  	if (slab_state >= UP) {
> >  		if (init_cache_random_seq(s))
> >  
> >> -- 
> >> Thank you, You are awesome!
> >> Hyeonggon :-)
> > 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches
  2022-02-27  3:49   ` Hyeonggon Yoo
@ 2022-03-02 16:31     ` Vlastimil Babka
  0 siblings, 0 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-02 16:31 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap,
	linux-doc

On 2/27/22 04:49, Hyeonggon Yoo wrote:
> I think it's not traces of "currently free objects"
> because index bit of free objects are set in obj_map bitmap?

Hm right, thanks.

> It's weird but it's traces of allocated objects that have been freed at
> least once (or <not available>)
> 
> I think we can fix the code or doc?

For now I'll fix the doc. Not clear to me myself what's the best usecase for
free_traces file. For alloc_traces it's clearly debugging memory leaks.
Freeing traces are most useful when a bug is detected and they are dumped in
dmesg. The debugfs file might be just for a rough idea where freeing usually
happens.

> Please tell me if I'm missing something :)
> 
>> +    Information in the output:
>> +    Number of objects, freeing function, minimal/average/maximal jiffies since free,
>> +    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
>> +
>> +    Example:::
>> +
>> +    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
>> +	kfree+0x2db/0x420
>> +	acpi_ut_update_ref_count+0x6a6/0x782
>> +	acpi_ut_update_object_reference+0x1ad/0x234
>> +	acpi_ut_remove_reference+0x7d/0x84
>> +	acpi_rs_get_prt_method_data+0x97/0xd6
>> +	acpi_get_irq_routing_table+0x82/0xc4
>> +	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
>> +	acpi_pci_irq_lookup+0x3a/0x1e0
>> +	acpi_pci_irq_enable+0x77/0x240
>> +	pcibios_enable_device+0x39/0x40
>> +	do_pci_enable_device.part.0+0x5d/0xe0
>> +	pci_enable_device_flags+0xfc/0x120
>> +	pci_enable_device+0x13/0x20
>> +	virtio_pci_probe+0x9e/0x170
>> +	local_pci_probe+0x48/0x80
>> +	pci_device_probe+0x105/0x1c0
>> +
> 
> Everything else looks nice!
> 
>>  Christoph Lameter, May 30, 2007
>>  Sergey Senozhatsky, October 23, 2015
>> -- 
>> 2.35.1
>> 
>> 
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-02-27  9:44   ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Hyeonggon Yoo
@ 2022-03-02 16:51     ` Vlastimil Babka
  2022-03-02 17:22       ` Hyeonggon Yoo
  0 siblings, 1 reply; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-02 16:51 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On 2/27/22 10:44, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
>> From: Oliver Glitta <glittao@gmail.com>
>> 
>> Many stack traces are similar so there are many similar arrays.
>> Stackdepot saves each unique stack only once.
>>
>> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
>> stackdepot to save stack trace.
>>
> 
> I think it's not a replacement?

It is, for the array 'addrs':

-#ifdef CONFIG_STACKTRACE
-	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
+#ifdef CONFIG_STACKDEPOT
+	depot_stack_handle_t handle;

Not confuse with 'addr' which is the immediate caller and indeed stays
for redundancy/kernels without stack trace enabled.

>> The benefits are smaller memory overhead and possibility to aggregate
>> per-cache statistics in the following patch using the stackdepot handle
>> instead of matching stacks manually.
>> 
>> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
>> 
>> This was initially merged as commit 788691464c29 and reverted by commit
>> ae14c63a9f20 due to several issues, that should now be fixed.
>> The problem of unconditional memory overhead by stackdepot has been
>> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
>> and stack_table allocation by kvmalloc()"), so the dependency on
>> stackdepot will result in extra memory usage only when a slab cache
>> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
>> The build failures on some architectures were also addressed, and the
>> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
>> patch.
> 
> This is just an idea and beyond this patch.
> 
> After this patch, now we have external storage that records stack traces.

Well, we had it before this patch too.

> It's possible that some rare stack traces are in stack depot, but
> not reachable because track is overwritten.

Yes.

> I think it's worth implementing a way to iterate through stacks in stack depot?

The question is for what use case? We might even not know who stored
them - could have been page_owner, or other stack depot users. But the
point is usually not to learn about all existing traces, but to
determine which ones cause an object lifetime bug, or memory leak.

>> 
>> Signed-off-by: Oliver Glitta <glittao@gmail.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Pekka Enberg <penberg@kernel.org>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-02 12:30           ` Mike Rapoport
@ 2022-03-02 17:02             ` Hyeonggon Yoo
  2022-03-02 17:27               ` Marco Elver
  0 siblings, 1 reply; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-03-02 17:02 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Vlastimil Babka, David Rientjes, Christoph Lameter, Joonsoo Kim,
	Pekka Enberg, Roman Gushchin, Andrew Morton, linux-mm, patches,
	linux-kernel, Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet,
	Randy Dunlap, Marco Elver, Karolina Drobnik

On Wed, Mar 02, 2022 at 02:30:56PM +0200, Mike Rapoport wrote:
> On Wed, Mar 02, 2022 at 10:09:37AM +0100, Vlastimil Babka wrote:
> > On 3/2/22 09:37, Mike Rapoport wrote:
> > > On Mon, Feb 28, 2022 at 09:27:02PM +0000, Hyeonggon Yoo wrote:
> > >> On Mon, Feb 28, 2022 at 08:10:18PM +0100, Vlastimil Babka wrote:
> > >> > On 2/26/22 08:19, Hyeonggon Yoo wrote:
> > >> > > On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
> > >> > >> Hi,
> > >> > >> 
> > >> > >> this series combines and revives patches from Oliver's last year
> > >> > >> bachelor thesis (where I was the advisor) that make SLUB's debugfs
> > >> > >> files alloc_traces and free_traces more useful.
> > >> > >> The resubmission was blocked on stackdepot changes that are now merged,
> > >> > >> as explained in patch 2.
> > >> > >> 
> > >> > > 
> > >> > > Hello. I just started review/testing this series.
> > >> > > 
> > >> > > it crashed on my system (arm64)
> > >> > 
> > >> > Hmm, interesting. On x86_64 this works for me and stackdepot is allocated
> > >> > from memblock. arm64 must have memblock freeing happen earlier or something.
> > >> > (CCing memblock experts)
> > >> > 
> > >> > > I ran with boot parameter slub_debug=U, and without KASAN.
> > >> > > So CONFIG_STACKDEPOT_ALWAYS_INIT=n.
> > >> > > 
> > >> > > void * __init memblock_alloc_try_nid(
> > >> > >                         phys_addr_t size, phys_addr_t align,
> > >> > >                         phys_addr_t min_addr, phys_addr_t max_addr,
> > >> > >                         int nid)
> > >> > > {
> > >> > >         void *ptr;
> > >> > > 
> > >> > >         memblock_dbg("%s: %llu bytes align=0x%llx nid=%d from=%pa max_addr=%pa %pS\n",
> > >> > >                      __func__, (u64)size, (u64)align, nid, &min_addr,
> > >> > >                      &max_addr, (void *)_RET_IP_);
> > >> > >         ptr = memblock_alloc_internal(size, align,
> > >> > >                                            min_addr, max_addr, nid, false);
> > >> > >         if (ptr)
> > >> > >                 memset(ptr, 0, size); <--- Crash Here
> > >> > > 
> > >> > >         return ptr;
> > >> > > }
> > >> > > 
> > >> > > It crashed during create_boot_cache() -> stack_depot_init() ->
> > >> > > memblock_alloc().
> > >> > > 
> > >> > > I think That's because, in kmem_cache_init(), both slab and memblock is not
> > >> > > available. (AFAIU memblock is not available after mem_init() because of
> > >> > > memblock_free_all(), right?)
> > >> > 
> > >> > Hm yes I see, even in x86_64 version mem_init() calls memblock_free_all().
> > >> > But then, I would expect stack_depot_init() to detect that memblock_alloc()
> > >> > returns NULL, we print ""Stack Depot hash table allocation failed,
> > >> > disabling" and disable it. Instead it seems memblock_alloc() returns
> > >> > something that's already potentially used by somebody else? Sounds like a bug?
> > >> 
> > >> 
> > >> By the way, I fixed this by allowing stack_depot_init() to be called in
> > >> kmem_cache_init() too [1] and Marco suggested that calling
> > >> stack_depot_init() depending on slub_debug parameter for simplicity. [2]
> > >> 
> > >> I would prefer [2], Would you take a look?
> > >> 
> > >> [1] https://lkml.org/lkml/2022/2/27/31
> > >> 
> > >> [2] https://lkml.org/lkml/2022/2/28/717
> > > 
> > > I have the third version :)
> > 
> > While simple, it changes the timing of stack_depot_early_init() that was
> > supposed to be at a single callsite - now it's less predictable and depends
> > on e.g. kernel parameter ordering. Some arch/config combo could break,
> > dunno. Setting a variable that stack_depot_early_init() checks should be
> > more robust.
> 
> Not sure I follow.
> stack_depot_early_init() is a wrapper for stack_depot_init() which already
> checks 
> 
> 	if (!stack_depot_disable && !stack_table)
> 
> So largely it can be at multiple call sites just like stack_depot_init...

In my opinion, allowing to call stack_depot_init() in various context is not a good
idea. For another simple example, slub_debug=U,vmap_area can fool the
current code because it's called in context where slab is available,
but vmalloc is unavailable. and stack_depot_init() will try to allocate
using kvmalloc(). Late initialization adds too much complexity.

So IMO we have two solutions.

First solution is only allowing early init and avoiding late init.
(setting a global variable that is visible to stack depot would do this)

And second solution is to make caller allocate and manage its own hash
table. All of this complexity is because we're trying to make stack_table
global.

First solution looks ok if we have few users of stack depot.
But I think we should use second approach if stack depot is growing
more and more callers?

> Still, I understand your concern of having multiple call sites for
> stack_depot_early_init().
> 
> The most robust way I can think of will be to make stack_depot_early_init()
> a proper function, move memblock_alloc() there and add a variable, say
> stack_depot_needed_early that will be set to 1 if
> CONFIG_STACKDEPOT_ALWAYS_INIT=y or by the callers that need to allocate the
> stack_table before kmalloc is up.
>  
> E.g
>
> __init int stack_depot_early_init(void)
> {
> 
> 	if (stack_depot_needed_early && !stack_table) {
> 		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> 		int i;
> 
> 		pr_info("Stack Depot allocating hash table with memblock_alloc\n");
> 		stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> 	
> 		if (!stack_table) {
> 			pr_err("Stack Depot hash table allocation failed, disabling\n");
> 			stack_depot_disable = true;
> 			return -ENOMEM;
> 		}
> 	}
> 
> 	return 0;
> }
>
> The mutex is not needed here because mm_init() -> stack_depot_early_init()
> happens before SMP and setting stack_table[i] to NULL is redundant with
> memblock_alloc(). (btw, kvmalloc case could use __GFP_ZERO as well).
> 
> I'm not sure if the stack depot should be disabled for good if the early
> allocation failed, but that's another story.
> 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index a74afe59a403..0c3ab2335b46 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1548,6 +1548,10 @@ static int __init setup_slub_debug(char *str)
> > >  	}
> > >  out:
> > >  	slub_debug = global_flags;
> > > +
> > > +	if (slub_flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > +		stack_depot_early_init();
> > > +
> > >  	if (slub_debug != 0 || slub_debug_string)
> > >  		static_branch_enable(&slub_debug_enabled);
> > >  	else
> > > @@ -4221,9 +4225,6 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
> > >  	s->remote_node_defrag_ratio = 1000;
> > >  #endif
> > >  
> > > -	if (s->flags & SLAB_STORE_USER && IS_ENABLED(CONFIG_STACKDEPOT))
> > > -		stack_depot_init();
> > > -
> > >  	/* Initialize the pre-computed randomized freelist if slab is up */
> > >  	if (slab_state >= UP) {
> > >  		if (init_cache_random_seq(s))
> > >  
> > >> -- 
> > >> Thank you, You are awesome!
> > >> Hyeonggon :-)
> > > 
> > 
> 
> -- 
> Sincerely yours,
> Mike.

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects
  2022-03-02 16:51     ` Vlastimil Babka
@ 2022-03-02 17:22       ` Hyeonggon Yoo
  0 siblings, 0 replies; 46+ messages in thread
From: Hyeonggon Yoo @ 2022-03-02 17:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed

On Wed, Mar 02, 2022 at 05:51:32PM +0100, Vlastimil Babka wrote:
> On 2/27/22 10:44, Hyeonggon Yoo wrote:
> > On Fri, Feb 25, 2022 at 07:03:15PM +0100, Vlastimil Babka wrote:
> >> From: Oliver Glitta <glittao@gmail.com>
> >> 
> >> Many stack traces are similar so there are many similar arrays.
> >> Stackdepot saves each unique stack only once.
> >>
> >> Replace field addrs in struct track with depot_stack_handle_t handle.  Use
> >> stackdepot to save stack trace.
> >>
> > 
> > I think it's not a replacement?
> 
> It is, for the array 'addrs':
> 
> -#ifdef CONFIG_STACKTRACE
> -	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
> +#ifdef CONFIG_STACKDEPOT
> +	depot_stack_handle_t handle;
> 
> Not confuse with 'addr' which is the immediate caller and indeed stays
> for redundancy/kernels without stack trace enabled.
>

Oh, my fault. Right. I was confused.
I should read it again.

> >> The benefits are smaller memory overhead and possibility to aggregate
> >> per-cache statistics in the following patch using the stackdepot handle
> >> instead of matching stacks manually.
> >> 
> >> [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
> >> 
> >> This was initially merged as commit 788691464c29 and reverted by commit
> >> ae14c63a9f20 due to several issues, that should now be fixed.
> >> The problem of unconditional memory overhead by stackdepot has been
> >> addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init
> >> and stack_table allocation by kvmalloc()"), so the dependency on
> >> stackdepot will result in extra memory usage only when a slab cache
> >> tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
> >> The build failures on some architectures were also addressed, and the
> >> reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
> >> patch.
> > 
> > This is just an idea and beyond this patch.
> > 
> > After this patch, now we have external storage that records stack traces.
> 
> Well, we had it before this patch too.
>
> > It's possible that some rare stack traces are in stack depot, but
> > not reachable because track is overwritten.
> 
> Yes.
> 
> > I think it's worth implementing a way to iterate through stacks in stack depot?
> 
> The question is for what use case? We might even not know who stored
> them - could have been page_owner, or other stack depot users.

> But the point is usually not to learn about all existing traces, but to
> determine which ones cause an object lifetime bug, or memory leak.

Yeah, this is exactly what I misunderstood.
I thought purpose of free_traces is to show all existing traces.
But I realized today that free trace without alloc trace is not useful.

I'll review v2 with these in mind.
Thank you.

> >> 
> >> Signed-off-by: Oliver Glitta <glittao@gmail.com>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> Cc: David Rientjes <rientjes@google.com>
> >> Cc: Christoph Lameter <cl@linux.com>
> >> Cc: Pekka Enberg <penberg@kernel.org>
> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-03-02 17:02             ` Hyeonggon Yoo
@ 2022-03-02 17:27               ` Marco Elver
  0 siblings, 0 replies; 46+ messages in thread
From: Marco Elver @ 2022-03-02 17:27 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Mike Rapoport, Vlastimil Babka, David Rientjes,
	Christoph Lameter, Joonsoo Kim, Pekka Enberg, Roman Gushchin,
	Andrew Morton, linux-mm, patches, linux-kernel, Oliver Glitta,
	Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap, Karolina Drobnik

On Wed, 2 Mar 2022 at 18:02, Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
[...]
> So IMO we have two solutions.
>
> First solution is only allowing early init and avoiding late init.
> (setting a global variable that is visible to stack depot would do this)
>
> And second solution is to make caller allocate and manage its own hash
> table. All of this complexity is because we're trying to make stack_table
> global.

I think this would be a mistake, because then we have to continuously
audit all users of stackdepot and make sure that allocation stack
traces don't end up in duplicate hash tables. It's global for a
reason.

> First solution looks ok if we have few users of stack depot.
> But I think we should use second approach if stack depot is growing
> more and more callers?

The problem here really is just that initialization of stackdepot and
slabs can have a cyclic dependency with the changes you're making. I
very much doubt there'll be other cases (beyond the allocator itself
used by stackdepot) which can introduce such a cyclic dependency.

The easiest way to break the cyclic dependency is to initialize
stackdepot earlier, assuming it can be determined it is required (in
this case it can because the command line is parsed before slab
creation). The suggestion with the stack_depot_needed_early variable
(like Mike's suggested code) would solve all that.

I don't understand the concern about multiple contexts. The problem is
just about a cyclic dependency during early init, and I doubt we'll
have more of that.

Thanks,
-- Marco


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/5] SLUB debugfs improvements based on stackdepot
  2022-02-26 12:18 ` Hyeonggon Yoo
@ 2022-03-04 17:25   ` Vlastimil Babka
  0 siblings, 0 replies; 46+ messages in thread
From: Vlastimil Babka @ 2022-03-04 17:25 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Rientjes, Christoph Lameter, Joonsoo Kim, Pekka Enberg,
	Roman Gushchin, Andrew Morton, linux-mm, patches, linux-kernel,
	Oliver Glitta, Faiyaz Mohammed, Jonathan Corbet, Randy Dunlap

On 2/26/22 13:18, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 07:03:13PM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> this series combines and revives patches from Oliver's last year
>> bachelor thesis (where I was the advisor) that make SLUB's debugfs
>> files alloc_traces and free_traces more useful.
>> The resubmission was blocked on stackdepot changes that are now merged,
>> as explained in patch 2.
>>
>> Patch 1 is a new preparatory cleanup.
>>
>> Patch 2 originally submitted here [1], was merged to mainline but
>> reverted for stackdepot related issues as explained in the patch.
>>
>> Patches 3-5 originally submitted as RFC here [2]. In this submission I
>> have omitted the new file 'all_objects' (patch 3/3 in [2]) as it might
>> be considered too intrusive so I will postpone it for later. The docs
>> patch is adjusted accordingly.
>>
> 
> This problem is not caused by this patch series.
> But I think it's worth mentioning...
> 
> It's really weird that some stack traces are not recorded
> when CONFIG_KASAN=y.
> 
> I made sure that:
> 	- Stack Depot did not reach its limit
> 	- the free path happen on CONFIG_KASAN=y too.
> 
> I have no clue why this happen.
> 
> # cat dentry/free_traces (CONFIG_KASAN=y)
>    6585 <not-available> age=4294912647 pid=0 cpus=0

I think it's some kind of KASAN quarantining of freed objects, so they
haven't been properly freed through the SLUB layer yet.

> # cat dentry/free_traces (CONFIG_KASAN=n)
>    1246 <not-available> age=4294906877 pid=0 cpus=0
>     379 __d_free+0x20/0x2c age=33/14225/14353 pid=0-122 cpus=0-3
>         kmem_cache_free+0x1f4/0x21c
>         __d_free+0x20/0x2c
>         rcu_core+0x334/0x580
>         rcu_core_si+0x14/0x20
>         __do_softirq+0x12c/0x2a8
> 
>       2 dentry_free+0x58/0xb0 age=14101/14101/14101 pid=158 cpus=0
>         kmem_cache_free+0x1f4/0x21c
>         dentry_free+0x58/0xb0
>         __dentry_kill+0x18c/0x1d0
>         dput+0x1c4/0x2fc
>         __fput+0xb0/0x230
>         ____fput+0x14/0x20
>         task_work_run+0x84/0x17c
>         do_notify_resume+0x208/0x1330
>         el0_svc+0x6c/0x80
>         el0t_64_sync_handler+0xa8/0x130
>         el0t_64_sync+0x1a0/0x1a4
> 
>       1 dentry_free+0x58/0xb0 age=7678 pid=190 cpus=1
>         kmem_cache_free+0x1f4/0x21c
>         dentry_free+0x58/0xb0
>         __dentry_kill+0x18c/0x1d0
>         dput+0x1c4/0x2fc
>         __fput+0xb0/0x230
>         ____fput+0x14/0x20
>         task_work_run+0x84/0x17c
>         do_exit+0x2dc/0x8e0
>         do_group_exit+0x38/0xa4
>         __wake_up_parent+0x0/0x34
>         invoke_syscall+0x48/0x114
>         el0_svc_common.constprop.0+0x44/0xfc
>         do_el0_svc+0x2c/0x94
>         el0_svc+0x28/0x80
>         el0t_64_sync_handler+0xa8/0x130
>         el0t_64_sync+0x1a0/0x1a4



^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-03-04 17:26 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 18:03 [PATCH 0/5] SLUB debugfs improvements based on stackdepot Vlastimil Babka
2022-02-25 18:03 ` [PATCH 1/5] mm/slub: move struct track init out of set_track() Vlastimil Babka
2022-02-26 10:41   ` Hyeonggon Yoo
2022-02-25 18:03 ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Vlastimil Babka
2022-02-26 10:24   ` Hyeonggon Yoo
2022-02-28 18:44     ` Vlastimil Babka
2022-02-27  3:08   ` [PATCH] lib/stackdepot: Use page allocator if both slab and memblock is unavailable Hyeonggon Yoo
2022-02-27  5:06     ` kernel test robot
2022-02-27  9:23     ` [PATCH v2] " Hyeonggon Yoo
2022-02-27 10:00     ` [PATCH] " kernel test robot
2022-02-28  7:00     ` Marco Elver
2022-02-28 10:05       ` Hyeonggon Yoo
2022-02-28 10:50         ` Marco Elver
2022-02-28 11:48           ` Hyeonggon Yoo
2022-02-28 15:09           ` [PATCH] mm/slub: initialize stack depot in boot process Hyeonggon Yoo
2022-02-28 16:28             ` Marco Elver
2022-03-01  2:12               ` Hyeonggon Yoo
2022-03-01  0:28             ` Vlastimil Babka
2022-02-27  9:44   ` [PATCH 2/5] mm/slub: use stackdepot to save stack trace in objects Hyeonggon Yoo
2022-03-02 16:51     ` Vlastimil Babka
2022-03-02 17:22       ` Hyeonggon Yoo
2022-02-25 18:03 ` [PATCH 3/5] mm/slub: aggregate and print stack traces in debugfs files Vlastimil Babka
2022-02-27  0:18   ` Hyeonggon Yoo
2022-02-27  0:22   ` Hyeonggon Yoo
2022-02-25 18:03 ` [PATCH 4/5] mm/slub: sort debugfs output by frequency of stack traces Vlastimil Babka
2022-02-26 11:03   ` Hyeonggon Yoo
2022-02-25 18:03 ` [PATCH 5/5] slab, documentation: add description of debugfs files for SLUB caches Vlastimil Babka
2022-02-27  3:49   ` Hyeonggon Yoo
2022-03-02 16:31     ` Vlastimil Babka
2022-02-26  7:19 ` [PATCH 0/5] SLUB debugfs improvements based on stackdepot Hyeonggon Yoo
2022-02-28 19:10   ` Vlastimil Babka
2022-02-28 20:01     ` Mike Rapoport
2022-02-28 21:20       ` Hyeonggon Yoo
2022-02-28 23:38       ` Vlastimil Babka
2022-03-01  9:21         ` Mike Rapoport
2022-03-01  9:41           ` Vlastimil Babka
2022-03-01 14:52             ` Mike Rapoport
2022-02-28 21:27     ` Hyeonggon Yoo
2022-03-01  9:23       ` Mike Rapoport
2022-03-02  8:37       ` Mike Rapoport
2022-03-02  9:09         ` Vlastimil Babka
2022-03-02 12:30           ` Mike Rapoport
2022-03-02 17:02             ` Hyeonggon Yoo
2022-03-02 17:27               ` Marco Elver
2022-02-26 12:18 ` Hyeonggon Yoo
2022-03-04 17:25   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).