[PATCH RFC sl-b] Export return addresses for better diagnostics

rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC sl-b] Export return addresses for better diagnostics
@ 2020-12-05  0:40 Paul E. McKenney
  2020-12-05  0:40 ` [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block paulmck
                   ` (6 more replies)
  0 siblings, 7 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series improves diagnostics by providing access to return addresses
and stack traces collected by the sl*b allocators.  If the allocator is
not configured to collect this information, the diagnostics fall back
to a reasonable approximation of their earlier state.

One use case is the queue_rcu_work() function, which might be used
by any number of kernel subsystems.  If the caller does back-to-back
invocations of queue_rcu_work(), this constitutes a double-free bug,
and (if so configured) the debug-objects system will flag this, printing
the callback function.  In most cases, printing this function suffices.
However, for double-free bugs involving queue_rcu_work(), the RCU callback
function will always be rcu_work_rcufn(), which provides almost no help to
the poor person trying to find this double-free bug.  The return address
from the allocator of the memory containing the rcu_work structure can
provide an additional valuable clue.

Another use case is the percpu_ref_switch_to_atomic_rcu() function,
which detects percpu_ref reference-count underflow.  Unfortunately,
the only data that this function has access to doesn't have much in the
way of identifying characteristics.  Yes, it might be possible to gain
more information from a crash dump, but it is more convenient for the
needed hints to be in the console log.

Unfortunately, printing the return address in this case is of little help
because this object is allocated from percpu_ref_init(), regardless of
what part of the kernel is responsible for the reference-count underflow.
However, CONFIG_STACKTRACE=y kernels (such as those enabling ftrace)
using slub with debugging enabled also collect stack traces.  This series
therefore also provides a way of extracting these stack traces to provide
additional information to those debugging percpu_ref reference-count
underflows.

The patches are as follows:

1.	Add kmem_last_alloc() to return last allocation for memory block.

2.	Add kmem_last_alloc_errstring() to provide more kmem_last_alloc()
	info.

3.	Make call_rcu() print allocation address of double-freed callback.

4.	Create kmem_last_alloc_stack() to provide stack trace in slub.

5.	percpu_ref: Print allocator upon reference-count underflow.

6.	percpu_ref: Print stack trace upon reference-count underflow.

						Thanx, Paul

------------------------------------------------------------------------

 include/linux/slab.h  |   15 ++++++++
 kernel/rcu/tree.c     |   12 +++++-
 lib/percpu-refcount.c |   39 ++++++++++++++++------
 mm/slab.c             |   61 +++++++++++++++++++++++------------
 mm/slab_common.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++------
 mm/slob.c             |   11 +++++-
 mm/slub.c             |   44 +++++++++++++++++++++++--
 7 files changed, 222 insertions(+), 47 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
@ 2020-12-05  0:40 ` paulmck
  2020-12-07  9:02   ` Joonsoo Kim
  2020-12-05  0:40 ` [PATCH sl-b 2/6] mm: Add kmem_last_alloc_errstring() to provide more kmem_last_alloc() info paulmck
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a new kmem_last_alloc() function that
takes a pointer to dynamically allocated memory and returns the return
address of the call that allocated it.  This pointer can reference the
middle of the block as well as the beginning of the block, as needed
by things like RCU callback functions and timer handlers that might not
know where the beginning of the memory block is.  These functions and
handlers can use the return value from kmem_last_alloc() to give the
kernel hacker a better hint as to where the problem might lie.

This kmem_last_alloc() function returns NULL for slob and when the
necessary debug has not been enabled for slab and slub.  For slub,
build with CONFIG_SLUB_DEBUG=y and boot with slub_debug=U, or pass
SLAB_STORE_USER to kmem_cache_create() if more focused use is desired.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/slab.h |  2 ++
 mm/slab.c            | 19 +++++++++++++++++++
 mm/slab_common.c     | 20 ++++++++++++++++++++
 mm/slob.c            |  5 +++++
 mm/slub.c            | 26 ++++++++++++++++++++++++++
 5 files changed, 72 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index dd6897f..06dd56b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -186,6 +186,8 @@ void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object);
+void *kmem_last_alloc(void *object);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index b111356..2ab93b8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3602,6 +3602,25 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
+void *kmem_cache_last_alloc(struct kmem_cache *cachep, void *object)
+{
+#ifdef DEBUG
+	unsigned int objnr;
+	void *objp;
+	struct page *page;
+
+	if (!(cachep->flags & SLAB_STORE_USER))
+		return NULL;
+	objp = object - obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	return *dbg_userword(cachep, objp);
+#else
+	return NULL;
+#endif
+}
+
 static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f9ccd5d..3f647982 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -536,6 +536,26 @@ bool slab_is_available(void)
 	return slab_state >= UP;
 }
 
+/*
+ * If the pointer references a slab-allocated object and if sufficient
+ * debugging is enabled, return the returrn address for the corresponding
+ * allocation.  Otherwise, return NULL.  Note that passing random pointers
+ * to this function (including addresses of on-stack variables) is likely
+ * to result in panics.
+ */
+void *kmem_last_alloc(void *object)
+{
+	struct page *page;
+
+	if (!virt_addr_valid(object))
+		return NULL;
+	page = virt_to_head_page(object);
+	if (!PageSlab(page))
+		return NULL;
+	return kmem_cache_last_alloc(page->slab_cache, object);
+}
+EXPORT_SYMBOL_GPL(kmem_last_alloc);
+
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name,
diff --git a/mm/slob.c b/mm/slob.c
index 7cc9805..c1f8ed7 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,6 +461,11 @@ static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
+{
+	return NULL;
+}
+
 /*
  * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
  */
diff --git a/mm/slub.c b/mm/slub.c
index b30be23..8ed3ba2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3918,6 +3918,32 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
+{
+#ifdef CONFIG_SLUB_DEBUG
+	void *base;
+	unsigned int objnr;
+	void *objp;
+	struct page *page;
+	struct track *trackp;
+
+	if (!(s->flags & SLAB_STORE_USER))
+		return NULL;
+	page = virt_to_head_page(object);
+	base = page_address(page);
+	objp = kasan_reset_tag(object);
+	objp = restore_red_left(s, objp);
+	objnr = obj_to_index(s, page, objp);
+	objp = base + s->size * objnr;
+	if (objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size)
+		return NULL;
+	trackp = get_track(s, objp, TRACK_ALLOC);
+	return (void *)trackp->addr;
+#else
+	return NULL;
+#endif
+}
+
 /********************************************************************
  *		Kmalloc subsystem
  *******************************************************************/
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH sl-b 2/6] mm: Add kmem_last_alloc_errstring() to provide more kmem_last_alloc() info
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
  2020-12-05  0:40 ` [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block paulmck
@ 2020-12-05  0:40 ` paulmck
  2020-12-05  0:40 ` [PATCH sl-b 3/6] rcu: Make call_rcu() print allocation address of double-freed callback paulmck
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

NULL pointers can be useful, but the NULL pointers from kmem_last_alloc()
might be caused by any number of things: A not-to-a-slab pointer,
failure to enable all the needed debugging, and bogus slob block-address
computations.  This commit therefore introduces error codes to the
kmem_last_alloc() function using the ERR_PTR() facility, and also
introduces kmem_last_alloc_errstring(), which translates the error codes
into strings.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/slab.h | 10 ++++++++++
 mm/slab.c            |  2 +-
 mm/slab_common.c     | 28 ++++++++++++++++++++++++++--
 mm/slob.c            |  2 +-
 mm/slub.c            |  4 ++--
 5 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 06dd56b..031e630 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -133,6 +133,15 @@
 #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \
 				(unsigned long)ZERO_SIZE_PTR)
 
+/*
+ * kmem_last_alloc error codes.
+ */
+#define KMEM_LA_NO_PAGE		1  /* No page structure for pointer. */
+#define KMEM_LA_NO_SLAB		2  /* Pointer not from slab allocator. */
+#define KMEM_LA_SLOB		3  /* No debugging info for slob. */
+#define KMEM_LA_NO_DEBUG	4  /* Debugging not enabled for slab/slub. */
+#define KMEM_LA_INCONSISTENT	5  /* Bogus block within slub page. */
+
 #include <linux/kasan.h>
 
 struct mem_cgroup;
@@ -188,6 +197,7 @@ size_t __ksize(const void *);
 size_t ksize(const void *);
 void *kmem_cache_last_alloc(struct kmem_cache *s, void *object);
 void *kmem_last_alloc(void *object);
+const char *kmem_last_alloc_errstring(void *lastalloc);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index 2ab93b8..1f3b263 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3610,7 +3610,7 @@ void *kmem_cache_last_alloc(struct kmem_cache *cachep, void *object)
 	struct page *page;
 
 	if (!(cachep->flags & SLAB_STORE_USER))
-		return NULL;
+		return ERR_PTR(-KMEM_LA_NO_DEBUG);
 	objp = object - obj_offset(cachep);
 	page = virt_to_head_page(objp);
 	objnr = obj_to_index(cachep, page, objp);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 3f647982..8430a14 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -537,6 +537,30 @@ bool slab_is_available(void)
 }
 
 /*
+ * If the pointer corresponds to a kmem_last_alloc() error, return
+ * a pointer to the corresponding string, otherwise NULL.
+ */
+const char *kmem_last_alloc_errstring(void *lastalloc)
+{
+	long klaerrno;
+	static const char * const es[] = {
+		"local memory",			/* KMEM_LA_NO_PAGE - 1 */
+		"non-slab memory",		/* KMEM_LA_NO_SLAB - 1 */
+		"slob doesn't do debug",	/* KMEM_LA_SLOB - 1 */
+		"debugging disabled",		/* KMEM_LA_NO_DEBUG - 1 */
+		"bogus slub block",		/* KMEM_LA_INCONSISTENT - 1 */
+	};
+
+	if (!IS_ERR(lastalloc))
+		return NULL;
+	klaerrno = -PTR_ERR(lastalloc) - 1;
+	if (WARN_ON_ONCE(klaerrno >= ARRAY_SIZE(es)))
+		return "kmem_last_alloc error out of range";
+	return es[klaerrno];
+}
+EXPORT_SYMBOL_GPL(kmem_last_alloc_errstring);
+
+/*
  * If the pointer references a slab-allocated object and if sufficient
  * debugging is enabled, return the returrn address for the corresponding
  * allocation.  Otherwise, return NULL.  Note that passing random pointers
@@ -548,10 +572,10 @@ void *kmem_last_alloc(void *object)
 	struct page *page;
 
 	if (!virt_addr_valid(object))
-		return NULL;
+		return ERR_PTR(-KMEM_LA_NO_PAGE);
 	page = virt_to_head_page(object);
 	if (!PageSlab(page))
-		return NULL;
+		return ERR_PTR(-KMEM_LA_NO_SLAB);
 	return kmem_cache_last_alloc(page->slab_cache, object);
 }
 EXPORT_SYMBOL_GPL(kmem_last_alloc);
diff --git a/mm/slob.c b/mm/slob.c
index c1f8ed7..e7d6b90 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -463,7 +463,7 @@ static void slob_free(void *block, int size)
 
 void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
 {
-	return NULL;
+	return ERR_PTR(-KMEM_LA_SLOB);
 }
 
 /*
diff --git a/mm/slub.c b/mm/slub.c
index 8ed3ba2..3ddf16a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3928,7 +3928,7 @@ void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
 	struct track *trackp;
 
 	if (!(s->flags & SLAB_STORE_USER))
-		return NULL;
+		return ERR_PTR(-KMEM_LA_NO_DEBUG);
 	page = virt_to_head_page(object);
 	base = page_address(page);
 	objp = kasan_reset_tag(object);
@@ -3936,7 +3936,7 @@ void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
 	objnr = obj_to_index(s, page, objp);
 	objp = base + s->size * objnr;
 	if (objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size)
-		return NULL;
+		return ERR_PTR(-KMEM_LA_INCONSISTENT);
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	return (void *)trackp->addr;
 #else
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH sl-b 3/6] rcu: Make call_rcu() print allocation address of double-freed callback
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
  2020-12-05  0:40 ` [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block paulmck
  2020-12-05  0:40 ` [PATCH sl-b 2/6] mm: Add kmem_last_alloc_errstring() to provide more kmem_last_alloc() info paulmck
@ 2020-12-05  0:40 ` paulmck
  2020-12-05  0:40 ` [PATCH sl-b 4/6] mm: Create kmem_last_alloc_stack() to provide stack trace in slub paulmck
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm, Andrii Nakryiko

From: "Paul E. McKenney" <paulmck@kernel.org>

The debug-object double-free checks in __call_rcu() print out the
RCU callback function, which is usually sufficient to track down the
double free.  However, all uses of things like queue_rcu_work() will
have the same RCU callback function (rcu_work_rcufn() in this case),
so a diagnostic message for a double queue_rcu_work() needs more than
just the callback function.

This commit therefore prints the last allocation address of the
double-freed callback when the callback is slab-allocated and sufficient
debugging is enabled.  It uses the shiny new kmem_last_alloc() and
kmem_last_alloc_errstring() functions for this purpose.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b6c9c49..788a072 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2957,6 +2957,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
 static void
 __call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
+	void *allocaddr;
+	const char *allocerr;
 	unsigned long flags;
 	struct rcu_data *rdp;
 	bool was_alldone;
@@ -2970,8 +2972,14 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		 * Use rcu:rcu_callback trace event to find the previous
 		 * time callback was passed to __call_rcu().
 		 */
-		WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!!\n",
-			  head, head->func);
+		allocaddr = kmem_last_alloc(head);
+		allocerr = kmem_last_alloc_errstring(allocaddr);
+		if (allocerr)
+			WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!! (%s)\n",
+				  head, head->func, allocerr);
+		else
+			WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!! (Allocated at %pS)\n",
+				  head, head->func, allocaddr);
 		WRITE_ONCE(head->func, rcu_leak_callback);
 		return;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH sl-b 4/6] mm: Create kmem_last_alloc_stack() to provide stack trace in slub
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
                   ` (2 preceding siblings ...)
  2020-12-05  0:40 ` [PATCH sl-b 3/6] rcu: Make call_rcu() print allocation address of double-freed callback paulmck
@ 2020-12-05  0:40 ` paulmck
  2020-12-05  0:40 ` [PATCH sl-b 5/6] percpu_ref: Print allocator upon reference-count underflow paulmck
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

In some cases, the allocator return address is in a common function,
so that more information is desired.  For example, a percpu_ref
reference-count underflow only has access to a data structure that is
allocated in percpu_ref_init().  In this case, the return address from
the allocator provides no additional information.

This commit therefore creates a kmem_cache_last_alloc() function that
can be passed stackp and nstackp parameters, allowing CONFIG_STACKTRACE=y
slub stack traces to be provided to the caller.

Please note that stack traces cannot be provided unless they are
collected.  Collecting stack traces requires that the kernel: (1) Use
the slub allocator, (2) Be built with CONFIG_STACKTRACE=y (which is the
case when ftrace is configured), and (3) Have slub debugging enabled
one way or another, for example, by booting with the "slub_debug=U"
kernel boot parameter.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/slab.h |  3 ++-
 mm/slab.c            | 40 +++++++++++++++++++++-------------------
 mm/slab_common.c     | 39 ++++++++++++++++++++++++++++++++-------
 mm/slob.c            |  4 +++-
 mm/slub.c            | 14 +++++++++++++-
 5 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 031e630..bdedefd 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -195,8 +195,9 @@ void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
-void *kmem_cache_last_alloc(struct kmem_cache *s, void *object);
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object, void **stackp, int nstackp);
 void *kmem_last_alloc(void *object);
+void *kmem_last_alloc_stack(void *object, void **stackp, int nstackp);
 const char *kmem_last_alloc_errstring(void *lastalloc);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
diff --git a/mm/slab.c b/mm/slab.c
index 1f3b263..ae1a74c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3602,25 +3602,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
-void *kmem_cache_last_alloc(struct kmem_cache *cachep, void *object)
-{
-#ifdef DEBUG
-	unsigned int objnr;
-	void *objp;
-	struct page *page;
-
-	if (!(cachep->flags & SLAB_STORE_USER))
-		return ERR_PTR(-KMEM_LA_NO_DEBUG);
-	objp = object - obj_offset(cachep);
-	page = virt_to_head_page(objp);
-	objnr = obj_to_index(cachep, page, objp);
-	objp = index_to_obj(cachep, page, objnr);
-	return *dbg_userword(cachep, objp);
-#else
-	return NULL;
-#endif
-}
-
 static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
@@ -3652,6 +3633,27 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 #endif /* CONFIG_NUMA */
 
+void *kmem_cache_last_alloc(struct kmem_cache *cachep, void *object, void **stackp, int nstackp)
+{
+#ifdef DEBUG
+	unsigned int objnr;
+	void *objp;
+	struct page *page;
+
+	if (!(cachep->flags & SLAB_STORE_USER))
+		return ERR_PTR(-KMEM_LA_NO_DEBUG);
+	objp = object - obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	if (stackp && nstackp)
+		stackp[0] = NULL;
+	return *dbg_userword(cachep, objp);
+#else
+	return NULL;
+#endif
+}
+
 /**
  * __do_kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8430a14..b70f357 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -560,14 +560,22 @@ const char *kmem_last_alloc_errstring(void *lastalloc)
 }
 EXPORT_SYMBOL_GPL(kmem_last_alloc_errstring);
 
-/*
+/**
+ * kmem_last_alloc_stack - Get return address and stack for last allocation
+ * @object: object for which to find last-allocation return address.
+ * @stackp: %NULL or pointer to location to place return-address stack.
+ * @nstackp: maximum number of return addresses that may be stored.
+ *
  * If the pointer references a slab-allocated object and if sufficient
- * debugging is enabled, return the returrn address for the corresponding
- * allocation.  Otherwise, return NULL.  Note that passing random pointers
- * to this function (including addresses of on-stack variables) is likely
- * to result in panics.
+ * debugging is enabled, return the return address for the corresponding
+ * allocation.  If stackp is non-%NULL in %CONFIG_STACKTRACE kernels running
+ * the slub allocator, also copy the return-address stack into @stackp,
+ * limited by @nstackp.  Otherwise, return %NULL or an appropriate error
+ * code using %ERR_PTR().
+ *
+ * Return: return address from last allocation, %NULL or negative error code.
  */
-void *kmem_last_alloc(void *object)
+void *kmem_last_alloc_stack(void *object, void **stackp, int nstackp)
 {
 	struct page *page;
 
@@ -576,7 +584,24 @@ void *kmem_last_alloc(void *object)
 	page = virt_to_head_page(object);
 	if (!PageSlab(page))
 		return ERR_PTR(-KMEM_LA_NO_SLAB);
-	return kmem_cache_last_alloc(page->slab_cache, object);
+	return kmem_cache_last_alloc(page->slab_cache, object, stackp, nstackp);
+}
+EXPORT_SYMBOL_GPL(kmem_last_alloc_stack);
+
+/**
+ * kmem_last_alloc - Get return address for last allocation
+ * @object: object for which to find last-allocation return address.
+ *
+ * If the pointer references a slab-allocated object and if sufficient
+ * debugging is enabled, return the return address for the corresponding
+ * allocation.  Otherwise, return %NULL or an appropriate error code using
+ * %ERR_PTR().
+ *
+ * Return: return address from last allocation, %NULL or negative error code.
+ */
+void *kmem_last_alloc(void *object)
+{
+	return kmem_last_alloc_stack(object, NULL, 0);
 }
 EXPORT_SYMBOL_GPL(kmem_last_alloc);
 
diff --git a/mm/slob.c b/mm/slob.c
index e7d6b90..dab7f3b 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,8 +461,10 @@ static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
-void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object, void **stackp, int nstackp)
 {
+	if (stackp && nstackp)
+		stackp[0] = NULL;
 	return ERR_PTR(-KMEM_LA_SLOB);
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 3ddf16a..a918b1d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3918,10 +3918,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
-void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
+void *kmem_cache_last_alloc(struct kmem_cache *s, void *object, void **stackp, int nstackp)
 {
 #ifdef CONFIG_SLUB_DEBUG
 	void *base;
+	int i = 0;
 	unsigned int objnr;
 	void *objp;
 	struct page *page;
@@ -3938,6 +3939,17 @@ void *kmem_cache_last_alloc(struct kmem_cache *s, void *object)
 	if (objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size)
 		return ERR_PTR(-KMEM_LA_INCONSISTENT);
 	trackp = get_track(s, objp, TRACK_ALLOC);
+#ifdef CONFIG_STACKTRACE
+	if (stackp) {
+		for (; i < nstackp && i < TRACK_ADDRS_COUNT; i++) {
+			stackp[i] = (void *)trackp->addrs[i];
+			if (!stackp[i])
+				break;
+		}
+	}
+#endif
+	if (stackp && i < nstackp)
+		stackp[i] = NULL;
 	return (void *)trackp->addr;
 #else
 	return NULL;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH sl-b 5/6] percpu_ref: Print allocator upon reference-count underflow
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
                   ` (3 preceding siblings ...)
  2020-12-05  0:40 ` [PATCH sl-b 4/6] mm: Create kmem_last_alloc_stack() to provide stack trace in slub paulmck
@ 2020-12-05  0:40 ` paulmck
  2020-12-05  0:40 ` [PATCH sl-b 6/6] percpu_ref: Print stack trace " paulmck
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  6 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Ming Lei,
	Jens Axboe

From: "Paul E. McKenney" <paulmck@kernel.org>

Reference-count underflow for percpu_ref is detected in the RCU callback
percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not
print anything allowing easy identification of which percpu_ref use
case is underflowing.  This is of course not normally a problem when
developing a new percpu_ref use case because it is most likely that
the problem resides in this new use case.  However, when deploying a
new kernel to a large set of servers, the underflow might well be a new
corner case in any of the old percpu_ref use cases.

This commit therefore prints the percpu_ref allocation site using the
new kmem_last_alloc() and kmem_last_alloc_errstring() functions in order
to provide a bit more information for the kernel-deployment case.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 lib/percpu-refcount.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index e59eda0..8c7b21a0 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -169,6 +169,8 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	struct percpu_ref *ref = data->ref;
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
 	unsigned long count = 0;
+	void *allocaddr;
+	const char *allocerr;
 	int cpu;
 
 	for_each_possible_cpu(cpu)
@@ -191,9 +193,16 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	 */
 	atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count);
 
-	WARN_ONCE(atomic_long_read(&data->count) <= 0,
-		  "percpu ref (%ps) <= 0 (%ld) after switching to atomic",
-		  data->release, atomic_long_read(&data->count));
+	if (atomic_long_read(&data->count) <= 0) {
+		allocaddr = kmem_last_alloc(data);
+		allocerr = kmem_last_alloc_errstring(allocaddr);
+		if (allocerr)
+			WARN_ONCE(1, "percpu ref (%ps) <= 0 (%ld) after switching to atomic (%s)",
+				  data->release, atomic_long_read(&data->count), allocerr);
+		else
+			WARN_ONCE(1, "percpu ref (%ps) <= 0 (%ld) after switching to atomic (allocated at %pS)",
+				  data->release, atomic_long_read(&data->count), allocaddr);
+	}
 
 	/* @ref is viewed as dead on all CPUs, send out switch confirmation */
 	percpu_ref_call_confirm_rcu(rcu);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH sl-b 6/6] percpu_ref: Print stack trace upon reference-count underflow
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
                   ` (4 preceding siblings ...)
  2020-12-05  0:40 ` [PATCH sl-b 5/6] percpu_ref: Print allocator upon reference-count underflow paulmck
@ 2020-12-05  0:40 ` paulmck
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  6 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-05  0:40 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Ming Lei,
	Jens Axboe

From: "Paul E. McKenney" <paulmck@kernel.org>

In some cases, the allocator return address is in a common function, so
that more information is desired.  For example, percpu_ref reference-count
underflow happens in an RCU callback function having access only
to a block of memory that is always allocated in percpu_ref_init().
This information is unhelpful.

This commit therefore causes the percpu_ref_switch_to_atomic_rcu()
function to use the new kmem_last_alloc_stack() function to collect
and print a stack trace upon reference-count underflow.  This requires
the kernel use the slub allocator and be built with CONFIG_STACKTRACE=y.
As always, slub debugging must be enabled one way or another, for example,
by booting with the "slub_debug=U" kernel boot parameter.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 lib/percpu-refcount.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 8c7b21a0..ebdfa47 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -169,8 +169,6 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	struct percpu_ref *ref = data->ref;
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
 	unsigned long count = 0;
-	void *allocaddr;
-	const char *allocerr;
 	int cpu;
 
 	for_each_possible_cpu(cpu)
@@ -194,14 +192,26 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count);
 
 	if (atomic_long_read(&data->count) <= 0) {
-		allocaddr = kmem_last_alloc(data);
+		void *allocaddr;
+		const char *allocerr;
+		void *allocstack[8];
+		int i;
+
+		allocaddr = kmem_last_alloc_stack(data, allocstack, ARRAY_SIZE(allocstack));
 		allocerr = kmem_last_alloc_errstring(allocaddr);
-		if (allocerr)
+		if (allocerr) {
 			WARN_ONCE(1, "percpu ref (%ps) <= 0 (%ld) after switching to atomic (%s)",
 				  data->release, atomic_long_read(&data->count), allocerr);
-		else
-			WARN_ONCE(1, "percpu ref (%ps) <= 0 (%ld) after switching to atomic (allocated at %pS)",
-				  data->release, atomic_long_read(&data->count), allocaddr);
+		} else {
+			pr_err("percpu ref (%ps) <= 0 (%ld) after switching to atomic (allocated at %pS)\n",
+			       data->release, atomic_long_read(&data->count), allocaddr);
+			for (i = 0; i < ARRAY_SIZE(allocstack); i++) {
+				if (!allocstack[i])
+					break;
+				pr_err("\t%pS\n", allocstack[i]);
+			}
+			WARN_ON_ONCE(1);
+		}
 	}
 
 	/* @ref is viewed as dead on all CPUs, send out switch confirmation */
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block
  2020-12-05  0:40 ` [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block paulmck
@ 2020-12-07  9:02   ` Joonsoo Kim
  2020-12-07 17:25     ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-07  9:02 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

Hello, Paul.

On Fri, Dec 04, 2020 at 04:40:52PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a new kmem_last_alloc() function that
> takes a pointer to dynamically allocated memory and returns the return
> address of the call that allocated it.  This pointer can reference the
> middle of the block as well as the beginning of the block, as needed
> by things like RCU callback functions and timer handlers that might not
> know where the beginning of the memory block is.  These functions and
> handlers can use the return value from kmem_last_alloc() to give the
> kernel hacker a better hint as to where the problem might lie.

I agree with exposing allocation caller information to the other
subsystem to help the debugging. Some suggestions...

1. It's better to separate a slab object check (validity check) and
retrieving the allocation caller. Someone else would want to check
only a validity. And, it doesn't depend on the debug configuration so
it's not good to bind it to the debug function.

kmem_cache_valid_(obj|ptr)
kmalloc_valid_(obj|ptr)

2. rename kmem_last_alloc to ...

int kmem_cache_debug_alloc_caller(cache, obj, &ret_addr)
int kmalloc_debug_alloc_caller(obj, &ret_addr)

or debug_kmem_cache_alloc_caller()

I think that function name need to include the keyword 'debug' to show
itself as a debugging facility (enabled at the debugging). And, return
errno and get caller address by pointer argument.

3. If concrete error message is needed, please introduce more functions.

void *kmalloc_debug_error(errno)

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block
  2020-12-07  9:02   ` Joonsoo Kim
@ 2020-12-07 17:25     ` Paul E. McKenney
  2020-12-08  8:57       ` Joonsoo Kim
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-07 17:25 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

On Mon, Dec 07, 2020 at 06:02:53PM +0900, Joonsoo Kim wrote:
> Hello, Paul.
> 
> On Fri, Dec 04, 2020 at 04:40:52PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a new kmem_last_alloc() function that
> > takes a pointer to dynamically allocated memory and returns the return
> > address of the call that allocated it.  This pointer can reference the
> > middle of the block as well as the beginning of the block, as needed
> > by things like RCU callback functions and timer handlers that might not
> > know where the beginning of the memory block is.  These functions and
> > handlers can use the return value from kmem_last_alloc() to give the
> > kernel hacker a better hint as to where the problem might lie.
> 
> I agree with exposing allocation caller information to the other
> subsystem to help the debugging. Some suggestions...

Good to hear!  ;-)

> 1. It's better to separate a slab object check (validity check) and
> retrieving the allocation caller. Someone else would want to check
> only a validity. And, it doesn't depend on the debug configuration so
> it's not good to bind it to the debug function.
> 
> kmem_cache_valid_(obj|ptr)
> kmalloc_valid_(obj|ptr)

Here both functions would say "true" for a pointer from kmalloc()?
Or do I need to add a third function that is happy with a pointer from
either source?

I do understand that people who don't want to distinguish could just do
"kmem_cache_valid_ptr(p) || kmalloc_valid_ptr(p)".  However, the two
use cases in the series have no idea whether the pointer they have came
from kmalloc(), kmem_cache_alloc(), or somewhere else entirely, even an
on-stack variable.

Are you asking me to choose between the _obj() and _ptr() suffixes?
If not, please help me understand the distinction.

Do we want "debug" in these names as well?

> 2. rename kmem_last_alloc to ...
> 
> int kmem_cache_debug_alloc_caller(cache, obj, &ret_addr)
> int kmalloc_debug_alloc_caller(obj, &ret_addr)
> 
> or debug_kmem_cache_alloc_caller()
> 
> I think that function name need to include the keyword 'debug' to show
> itself as a debugging facility (enabled at the debugging). And, return
> errno and get caller address by pointer argument.

I am quite happy to add the "debug", but my use cases have no idea
how the pointer was allocated.  In fact, the next version of the
patch will also handle allocator return addresses from vmalloc().

And for kernels without sufficient debug enabled, I need to provide
the name of the slab cache, and this also is to be in the next version.

> 3. If concrete error message is needed, please introduce more functions.
> 
> void *kmalloc_debug_error(errno)

Agreed, in fact, I was planning to have a function that printed out
a suitable error-message continuation to the console for ease-of-use
reasons.  For example, why is the caller deciding how deep the stack
frame is?  ;-)

So something like this?

	void kmalloc_debug_print_provenance(void *ptr);

With the understanding that it will print something helpful regardless
of where ptr came from, within the constraints of the kernel build and
boot options?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block
  2020-12-07 17:25     ` Paul E. McKenney
@ 2020-12-08  8:57       ` Joonsoo Kim
  2020-12-08 15:17         ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-08  8:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

On Mon, Dec 07, 2020 at 09:25:54AM -0800, Paul E. McKenney wrote:
> On Mon, Dec 07, 2020 at 06:02:53PM +0900, Joonsoo Kim wrote:
> > Hello, Paul.
> > 
> > On Fri, Dec 04, 2020 at 04:40:52PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > There are kernel facilities such as per-CPU reference counts that give
> > > error messages in generic handlers or callbacks, whose messages are
> > > unenlightening.  In the case of per-CPU reference-count underflow, this
> > > is not a problem when creating a new use of this facility because in that
> > > case the bug is almost certainly in the code implementing that new use.
> > > However, trouble arises when deploying across many systems, which might
> > > exercise corner cases that were not seen during development and testing.
> > > Here, it would be really nice to get some kind of hint as to which of
> > > several uses the underflow was caused by.
> > > 
> > > This commit therefore exposes a new kmem_last_alloc() function that
> > > takes a pointer to dynamically allocated memory and returns the return
> > > address of the call that allocated it.  This pointer can reference the
> > > middle of the block as well as the beginning of the block, as needed
> > > by things like RCU callback functions and timer handlers that might not
> > > know where the beginning of the memory block is.  These functions and
> > > handlers can use the return value from kmem_last_alloc() to give the
> > > kernel hacker a better hint as to where the problem might lie.
> > 
> > I agree with exposing allocation caller information to the other
> > subsystem to help the debugging. Some suggestions...
> 
> Good to hear!  ;-)
> 
> > 1. It's better to separate a slab object check (validity check) and
> > retrieving the allocation caller. Someone else would want to check
> > only a validity. And, it doesn't depend on the debug configuration so
> > it's not good to bind it to the debug function.
> > 
> > kmem_cache_valid_(obj|ptr)
> > kmalloc_valid_(obj|ptr)
> 
> Here both functions would say "true" for a pointer from kmalloc()?
> Or do I need to add a third function that is happy with a pointer from
> either source?

I focused on separation and missed this case that the user sometimes
cannot know the object source (kmalloc/kmem_cache). At first step,
just checking whether it is a slab-object or not looks enough.

int kmem_valid_obj()

> 
> I do understand that people who don't want to distinguish could just do
> "kmem_cache_valid_ptr(p) || kmalloc_valid_ptr(p)".  However, the two
> use cases in the series have no idea whether the pointer they have came
> from kmalloc(), kmem_cache_alloc(), or somewhere else entirely, even an
> on-stack variable.
> 
> Are you asking me to choose between the _obj() and _ptr() suffixes?

Yes, I prefer _obj().

> If not, please help me understand the distinction.
> 
> Do we want "debug" in these names as well?

I don't think so since it can be called without enabling the debug
option.

> 
> > 2. rename kmem_last_alloc to ...
> > 
> > int kmem_cache_debug_alloc_caller(cache, obj, &ret_addr)
> > int kmalloc_debug_alloc_caller(obj, &ret_addr)
> > 
> > or debug_kmem_cache_alloc_caller()
> > 
> > I think that function name need to include the keyword 'debug' to show
> > itself as a debugging facility (enabled at the debugging). And, return
> > errno and get caller address by pointer argument.
> 
> I am quite happy to add the "debug", but my use cases have no idea
> how the pointer was allocated.  In fact, the next version of the
> patch will also handle allocator return addresses from vmalloc().
> 
> And for kernels without sufficient debug enabled, I need to provide
> the name of the slab cache, and this also is to be in the next version.

Okay. So, your code would be...

if (kmem_valid_obj(ptr))
        kmalloc_debug_print_provenance(ptr)
else if (vmalloc_valid_obj(ptr))
        ....

> > 3. If concrete error message is needed, please introduce more functions.
> > 
> > void *kmalloc_debug_error(errno)
> 
> Agreed, in fact, I was planning to have a function that printed out
> a suitable error-message continuation to the console for ease-of-use
> reasons.  For example, why is the caller deciding how deep the stack
> frame is?  ;-)
> 
> So something like this?
> 
> 	void kmalloc_debug_print_provenance(void *ptr);
> 
> With the understanding that it will print something helpful regardless
> of where ptr came from, within the constraints of the kernel build and
> boot options?

Looks good idea. I suggest a name, kmem_dump_obj(), for this function.
In this case, I don't think that "debug" keyword is needed since it shows
something useful (slab cache info) even if debug option isn't enabled.

So, for summary, we need to introduce two functions to accomplish your
purpose. Please correct me if wrong.

int kmem_valid_obj(ptr)
void kmem_dump_obj(ptr)

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block
  2020-12-08  8:57       ` Joonsoo Kim
@ 2020-12-08 15:17         ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-08 15:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

On Tue, Dec 08, 2020 at 05:57:07PM +0900, Joonsoo Kim wrote:
> On Mon, Dec 07, 2020 at 09:25:54AM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 07, 2020 at 06:02:53PM +0900, Joonsoo Kim wrote:
> > > Hello, Paul.
> > > 
> > > On Fri, Dec 04, 2020 at 04:40:52PM -0800, paulmck@kernel.org wrote:
> > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > > 
> > > > There are kernel facilities such as per-CPU reference counts that give
> > > > error messages in generic handlers or callbacks, whose messages are
> > > > unenlightening.  In the case of per-CPU reference-count underflow, this
> > > > is not a problem when creating a new use of this facility because in that
> > > > case the bug is almost certainly in the code implementing that new use.
> > > > However, trouble arises when deploying across many systems, which might
> > > > exercise corner cases that were not seen during development and testing.
> > > > Here, it would be really nice to get some kind of hint as to which of
> > > > several uses the underflow was caused by.
> > > > 
> > > > This commit therefore exposes a new kmem_last_alloc() function that
> > > > takes a pointer to dynamically allocated memory and returns the return
> > > > address of the call that allocated it.  This pointer can reference the
> > > > middle of the block as well as the beginning of the block, as needed
> > > > by things like RCU callback functions and timer handlers that might not
> > > > know where the beginning of the memory block is.  These functions and
> > > > handlers can use the return value from kmem_last_alloc() to give the
> > > > kernel hacker a better hint as to where the problem might lie.
> > > 
> > > I agree with exposing allocation caller information to the other
> > > subsystem to help the debugging. Some suggestions...
> > 
> > Good to hear!  ;-)
> > 
> > > 1. It's better to separate a slab object check (validity check) and
> > > retrieving the allocation caller. Someone else would want to check
> > > only a validity. And, it doesn't depend on the debug configuration so
> > > it's not good to bind it to the debug function.
> > > 
> > > kmem_cache_valid_(obj|ptr)
> > > kmalloc_valid_(obj|ptr)
> > 
> > Here both functions would say "true" for a pointer from kmalloc()?
> > Or do I need to add a third function that is happy with a pointer from
> > either source?
> 
> I focused on separation and missed this case that the user sometimes
> cannot know the object source (kmalloc/kmem_cache). At first step,
> just checking whether it is a slab-object or not looks enough.
> 
> int kmem_valid_obj()

OK, I will update my current kmalloc_valid_obj() to kmem_valid_obj(),
thank you!

> > I do understand that people who don't want to distinguish could just do
> > "kmem_cache_valid_ptr(p) || kmalloc_valid_ptr(p)".  However, the two
> > use cases in the series have no idea whether the pointer they have came
> > from kmalloc(), kmem_cache_alloc(), or somewhere else entirely, even an
> > on-stack variable.
> > 
> > Are you asking me to choose between the _obj() and _ptr() suffixes?
> 
> Yes, I prefer _obj().

Then _obj() it is.

> > If not, please help me understand the distinction.
> > 
> > Do we want "debug" in these names as well?
> 
> I don't think so since it can be called without enabling the debug
> option.

OK, understood.

> > > 2. rename kmem_last_alloc to ...
> > > 
> > > int kmem_cache_debug_alloc_caller(cache, obj, &ret_addr)
> > > int kmalloc_debug_alloc_caller(obj, &ret_addr)
> > > 
> > > or debug_kmem_cache_alloc_caller()
> > > 
> > > I think that function name need to include the keyword 'debug' to show
> > > itself as a debugging facility (enabled at the debugging). And, return
> > > errno and get caller address by pointer argument.
> > 
> > I am quite happy to add the "debug", but my use cases have no idea
> > how the pointer was allocated.  In fact, the next version of the
> > patch will also handle allocator return addresses from vmalloc().
> > 
> > And for kernels without sufficient debug enabled, I need to provide
> > the name of the slab cache, and this also is to be in the next version.
> 
> Okay. So, your code would be...
> 
> if (kmem_valid_obj(ptr))
>         kmalloc_debug_print_provenance(ptr)
> else if (vmalloc_valid_obj(ptr))
>         ....

Suggestions on where to put the mem_dump_obj() or whatever name that
executes this code?  Left to myself, I will pick a likely on the theory
that it can always be moved later.

This structuring does cause double work, but this should be OK because
all of the uses I know of are on error paths.

> > > 3. If concrete error message is needed, please introduce more functions.
> > > 
> > > void *kmalloc_debug_error(errno)
> > 
> > Agreed, in fact, I was planning to have a function that printed out
> > a suitable error-message continuation to the console for ease-of-use
> > reasons.  For example, why is the caller deciding how deep the stack
> > frame is?  ;-)
> > 
> > So something like this?
> > 
> > 	void kmalloc_debug_print_provenance(void *ptr);
> > 
> > With the understanding that it will print something helpful regardless
> > of where ptr came from, within the constraints of the kernel build and
> > boot options?
> 
> Looks good idea. I suggest a name, kmem_dump_obj(), for this function.
> In this case, I don't think that "debug" keyword is needed since it shows
> something useful (slab cache info) even if debug option isn't enabled.
> 
> So, for summary, we need to introduce two functions to accomplish your
> purpose. Please correct me if wrong.
> 
> int kmem_valid_obj(ptr)
> void kmem_dump_obj(ptr)

Within slab, agreed.

We course also need something like mem_dump_obj() to handle a pointer
with unknown provenance, along with the vmalloc_valid_obj() and the
vmalloc_dump_obj().  And similar functions should other allocation
sources become important.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics
  2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
                   ` (5 preceding siblings ...)
  2020-12-05  0:40 ` [PATCH sl-b 6/6] percpu_ref: Print stack trace " paulmck
@ 2020-12-09  1:11 ` Paul E. McKenney
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
                     ` (5 more replies)
  6 siblings, 6 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09  1:11 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This is v2 of the series the improves diagnostics by providing access to
additional information including the return addresses collected by the
sl*b allocators and by vmalloc().  If the allocator is not configured
to collect this information, the diagnostics fall back to a reasonable
approximation of their earlier state.

One use case is the queue_rcu_work() function, which might be used
by any number of kernel subsystems.  If the caller does back-to-back
invocations of queue_rcu_work(), this constitutes a double-free bug,
and (if so configured) the debug-objects system will flag this, printing
the callback function.  In most cases, printing this function suffices.
However, for double-free bugs involving queue_rcu_work(), the RCU callback
function will always be rcu_work_rcufn(), which provides almost no help to
the poor person trying to find this double-free bug.  The return address
from the allocator of the memory containing the rcu_work structure can
provide an additional valuable clue.

Another use case is the percpu_ref_switch_to_atomic_rcu() function,
which detects percpu_ref reference-count underflow.  Unfortunately,
the only data that this function has access to doesn't have much in the
way of identifying characteristics.  Yes, it might be possible to gain
more information from a crash dump, but it is more convenient for the
needed hints to be in the console log.

Unfortunately, printing the return address in this case is of little help
because this object is allocated from percpu_ref_init(), regardless of
what part of the kernel is responsible for the reference-count underflow.
However, CONFIG_STACKTRACE=y kernels (such as those enabling ftrace)
using slub with debugging enabled also collect stack traces.  This series
therefore also provides a way of extracting these stack traces to provide
additional information to those debugging percpu_ref reference-count
underflows.

The patches are as follows:

1.	Add mem_dump_obj() to print source of memory block.

2.	Make mem_dump_obj() handle NULL and zero-sized pointers.

3.	Make mem_dump_obj() handle vmalloc() memory.

4.	Make call_rcu() print mem_dump_obj() info for double-freed
	callback.

5.	Dump mem_dump_obj() info upon reference-count underflow.

						Thanx, Paul

Changes since v1:

o	Apply feedback from Joonsoo Kim, mostly around naming and
	code structure.

o	Apply fix suggested by Stephen Rothwell for a bug that was
	also located by kbuild test robot.

o	Add support for vmalloc().

o	Add support for special pointers.

o	Additional rework simplifying use of mem_dump_obj(), which
	simplifies both the RCU and the percpu_ref uses.

------------------------------------------------------------------------

 include/linux/mm.h      |    2 +
 include/linux/slab.h    |    2 +
 include/linux/vmalloc.h |    6 ++++
 kernel/rcu/tree.c       |    7 +++-
 lib/percpu-refcount.c   |   12 ++++++--
 mm/slab.c               |   28 +++++++++++++++++++
 mm/slab.h               |   11 +++++++
 mm/slab_common.c        |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c               |    7 ++++
 mm/slub.c               |   40 +++++++++++++++++++++++++++
 mm/util.c               |   44 ++++++++++++++++++++++++++----
 mm/vmalloc.c            |   12 ++++++++
 12 files changed, 229 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
@ 2020-12-09  1:12   ` paulmck
  2020-12-09  8:17     ` Christoph Hellwig
                       ` (2 more replies)
  2020-12-09  1:13   ` [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
                     ` (4 subsequent siblings)
  5 siblings, 3 replies; 49+ messages in thread
From: paulmck @ 2020-12-09  1:12 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/mm.h   |  2 ++
 include/linux/slab.h |  2 ++
 mm/slab.c            | 28 +++++++++++++++++++++
 mm/slab.h            | 11 +++++++++
 mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c            |  7 ++++++
 mm/slub.c            | 40 ++++++++++++++++++++++++++++++
 mm/util.c            | 25 +++++++++++++++++++
 8 files changed, 184 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe..1eea266 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+void mem_dump_obj(void *object);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index dd6897f..169b511 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -186,6 +186,8 @@ void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
+bool kmem_valid_obj(void *object);
+void kmem_dump_obj(void *object);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index b111356..72b6743 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3602,6 +3602,34 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+#ifdef DEBUG
+	struct kmem_cache *cachep;
+	void *object = kpp->kp_ptr;
+	unsigned int objnr;
+	void *objp;
+	struct page *page = kpp->kp_page;
+
+	cachep = page->slab_cache;
+	if (!(cachep->flags & SLAB_STORE_USER)) {
+		kpp->kp_ret = NULL;
+		goto nodebug;
+	}
+	objp = object - obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	kpp->kp_objp = objp;
+	kpp->kp_ret = *dbg_userword(cachep, objp);
+nodebug:
+#else
+	kpp->kp_ret = NULL;
+#endif
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
diff --git a/mm/slab.h b/mm/slab.h
index 6d7c6a5..28a41d5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -630,4 +630,15 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
 	return false;
 }
 
+#define KS_ADDRS_COUNT 16
+struct kmem_provenance {
+	void *kp_ptr;
+	struct page *kp_page;
+	void *kp_objp;
+	void *kp_ret;
+	void *kp_stack[KS_ADDRS_COUNT];
+	int kp_nstack;
+};
+void kmem_provenance(struct kmem_provenance *kpp);
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f9ccd5d..09f0cbc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -536,6 +536,75 @@ bool slab_is_available(void)
 	return slab_state >= UP;
 }
 
+/**
+ * kmem_valid_obj - does the pointer reference a valid slab object?
+ * @object: pointer to query.
+ *
+ * Return: %true if the pointer is to a not-yet-freed object from
+ * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
+ * is to an already-freed object, and %false otherwise.
+ */
+bool kmem_valid_obj(void *object)
+{
+	struct page *page;
+
+	if (!virt_addr_valid(object))
+		return false;
+	page = virt_to_head_page(object);
+	return PageSlab(page);
+}
+EXPORT_SYMBOL_GPL(kmem_valid_obj);
+
+/**
+ * kmem_dump_obj - Print available slab provenance information
+ * @object: slab object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For a slab-cache object, the fact that it is a slab object is printed,
+ * and, if available, the slab name, return address, and stack trace from
+ * the allocation of that object.
+ *
+ * This function will splat if passed a pointer to a non-slab object.
+ * If you are not sure what type of object you have, you should instead
+ * use mem_dump_obj().
+ */
+void kmem_dump_obj(void *object)
+{
+	int i;
+	struct page *page;
+	struct kmem_provenance kp;
+
+	if (WARN_ON_ONCE(!virt_addr_valid(object)))
+		return;
+	page = virt_to_head_page(object);
+	if (WARN_ON_ONCE(!PageSlab(page))) {
+		pr_cont(" non-slab memory.\n");
+		return;
+	}
+	kp.kp_ptr = object;
+	kp.kp_page = page;
+	kp.kp_nstack = KS_ADDRS_COUNT;
+	kmem_provenance(&kp);
+	if (page->slab_cache)
+		pr_cont(" slab %s", page->slab_cache->name);
+	else
+		pr_cont(" slab ");
+	if (kp.kp_ret)
+		pr_cont(" allocated at %pS\n", kp.kp_ret);
+	else
+		pr_cont("\n");
+	if (kp.kp_stack[0]) {
+		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
+			if (!kp.kp_stack[i])
+				break;
+			pr_info("    %pS\n", kp.kp_stack[i]);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(kmem_dump_obj);
+
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name,
diff --git a/mm/slob.c b/mm/slob.c
index 7cc9805..fb10493 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,6 +461,13 @@ static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+	kpp->kp_ret = NULL;
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 /*
  * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
  */
diff --git a/mm/slub.c b/mm/slub.c
index b30be23..027fe0f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+#ifdef CONFIG_SLUB_DEBUG
+	void *base;
+	int i;
+	void *object = kpp->kp_ptr;
+	unsigned int objnr;
+	void *objp;
+	struct page *page = kpp->kp_page;
+	struct kmem_cache *s = page->slab_cache;
+	struct track *trackp;
+
+	base = page_address(page);
+	objp = kasan_reset_tag(object);
+	objp = restore_red_left(s, objp);
+	objnr = obj_to_index(s, page, objp);
+	objp = base + s->size * objnr;
+	kpp->kp_objp = objp;
+	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
+	    !(s->flags & SLAB_STORE_USER))
+		goto nodebug;
+	trackp = get_track(s, objp, TRACK_ALLOC);
+	kpp->kp_ret = (void *)trackp->addr;
+#ifdef CONFIG_STACKTRACE
+	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
+		kpp->kp_stack[i] = (void *)trackp->addrs[i];
+		if (!kpp->kp_stack[i])
+			break;
+	}
+#endif
+	if (kpp->kp_stack && i < kpp->kp_nstack)
+		kpp->kp_stack[i] = NULL;
+	return;
+nodebug:
+#endif
+	kpp->kp_ret = NULL;
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 /********************************************************************
  *		Kmalloc subsystem
  *******************************************************************/
diff --git a/mm/util.c b/mm/util.c
index 4ddb6e1..d0e60d2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
 	kunmap_atomic(addr1);
 	return ret;
 }
+
+/**
+ * mem_dump_obj - Print available provenance information
+ * @object: object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For example, for a slab-cache object, the slab name is printed, and,
+ * if available, the return address and stack trace from the allocation
+ * of that object.
+ */
+void mem_dump_obj(void *object)
+{
+	if (!virt_addr_valid(object)) {
+		pr_cont(" non-paged (local) memory.\n");
+		return;
+	}
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	pr_cont(" non-slab memory.\n");
+}
+EXPORT_SYMBOL_GPL(mem_dump_obj);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
@ 2020-12-09  1:13   ` paulmck
  2020-12-09 17:48     ` Vlastimil Babka
  2020-12-09  1:13   ` [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: paulmck @ 2020-12-09  1:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit makes mem_dump_obj() call out NULL and zero-sized pointers
specially instead of classifying them as non-paged memory.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 mm/util.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index d0e60d2..8c2449f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -985,7 +985,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
 void mem_dump_obj(void *object)
 {
 	if (!virt_addr_valid(object)) {
-		pr_cont(" non-paged (local) memory.\n");
+		if (object == NULL)
+			pr_cont(" NULL pointer.\n");
+		else if (object == ZERO_SIZE_PTR)
+			pr_cont(" zero-size pointer.\n");
+		else
+			pr_cont(" non-paged (local) memory.\n");
 		return;
 	}
 	if (kmem_valid_obj(object)) {
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
  2020-12-09  1:13   ` [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
@ 2020-12-09  1:13   ` paulmck
  2020-12-09 17:51     ` Vlastimil Babka
  2020-12-09 19:36     ` Uladzislau Rezki
  2020-12-09  1:13   ` [PATCH v2 sl-b 4/5] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: paulmck @ 2020-12-09  1:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit adds vmalloc() support to mem_dump_obj().  Note that the
vmalloc_dump_obj() function combines the checking and dumping, in
contrast with the split between kmem_valid_obj() and kmem_dump_obj().
The reason for the difference is that the checking in the vmalloc()
case involves acquiring a global lock, and redundant acquisitions of
global locks should be avoided, even on not-so-fast paths.

Note that this change causes on-stack variables to be reported as
vmalloc() storage from kernel_clone() or similar, depending on the degree
of inlining that your compiler does.  This is likely more helpful than
the earlier "non-paged (local) memory".

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/vmalloc.h |  6 ++++++
 mm/util.c               | 12 +++++++-----
 mm/vmalloc.c            | 12 ++++++++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 938eaf9..c89c2be 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 int register_vmap_purge_notifier(struct notifier_block *nb);
 int unregister_vmap_purge_notifier(struct notifier_block *nb);
 
+#ifdef CONFIG_MMU
+bool vmalloc_dump_obj(void *object);
+#else
+static inline bool vmalloc_dump_obj(void *object) { return false; }
+#endif
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/util.c b/mm/util.c
index 8c2449f..ee99a0a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
  */
 void mem_dump_obj(void *object)
 {
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	if (vmalloc_dump_obj(object))
+		return;
 	if (!virt_addr_valid(object)) {
 		if (object == NULL)
 			pr_cont(" NULL pointer.\n");
@@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
 			pr_cont(" non-paged (local) memory.\n");
 		return;
 	}
-	if (kmem_valid_obj(object)) {
-		kmem_dump_obj(object);
-		return;
-	}
-	pr_cont(" non-slab memory.\n");
+	pr_cont(" non-slab/vmalloc memory.\n");
 }
 EXPORT_SYMBOL_GPL(mem_dump_obj);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a..7421719 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 }
 #endif	/* CONFIG_SMP */
 
+bool vmalloc_dump_obj(void *object)
+{
+	struct vm_struct *vm;
+	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+
+	vm = find_vm_area(objp);
+	if (!vm)
+		return false;
+	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
+	return true;
+}
+
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
 	__acquires(&vmap_purge_lock)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 sl-b 4/5] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                     ` (2 preceding siblings ...)
  2020-12-09  1:13   ` [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
@ 2020-12-09  1:13   ` paulmck
  2020-12-09  1:13   ` [PATCH v2 sl-b 5/5] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-09  1:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

The debug-object double-free checks in __call_rcu() print out the
RCU callback function, which is usually sufficient to track down the
double free.  However, all uses of things like queue_rcu_work() will
have the same RCU callback function (rcu_work_rcufn() in this case),
so a diagnostic message for a double queue_rcu_work() needs more than
just the callback function.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the double-freed callback.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b6c9c49..464cf14 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2957,6 +2957,7 @@ static void check_cb_ovld(struct rcu_data *rdp)
 static void
 __call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
+	static atomic_t doublefrees;
 	unsigned long flags;
 	struct rcu_data *rdp;
 	bool was_alldone;
@@ -2970,8 +2971,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		 * Use rcu:rcu_callback trace event to find the previous
 		 * time callback was passed to __call_rcu().
 		 */
-		WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!!\n",
-			  head, head->func);
+		if (atomic_inc_return(&doublefrees) < 4) {
+			pr_err("%s(): Double-freed CB %p->%pS()!!!  ", __func__, head, head->func);
+			mem_dump_obj(head);
+		}
 		WRITE_ONCE(head->func, rcu_leak_callback);
 		return;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 sl-b 5/5] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                     ` (3 preceding siblings ...)
  2020-12-09  1:13   ` [PATCH v2 sl-b 4/5] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
@ 2020-12-09  1:13   ` paulmck
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-09  1:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Ming Lei, Jens Axboe

From: "Paul E. McKenney" <paulmck@kernel.org>

Reference-count underflow for percpu_ref is detected in the RCU callback
percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not
print anything allowing easy identification of which percpu_ref use
case is underflowing.  This is of course not normally a problem when
developing a new percpu_ref use case because it is most likely that
the problem resides in this new use case.  However, when deploying a
new kernel to a large set of servers, the underflow might well be a new
corner case in any of the old percpu_ref use cases.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the underflowing percpu_ref instance.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 lib/percpu-refcount.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index e59eda0..a1071cd 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <linux/percpu-refcount.h>
 
 /*
@@ -168,6 +169,7 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 			struct percpu_ref_data, rcu);
 	struct percpu_ref *ref = data->ref;
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
+	static atomic_t underflows;
 	unsigned long count = 0;
 	int cpu;
 
@@ -191,9 +193,13 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	 */
 	atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count);
 
-	WARN_ONCE(atomic_long_read(&data->count) <= 0,
-		  "percpu ref (%ps) <= 0 (%ld) after switching to atomic",
-		  data->release, atomic_long_read(&data->count));
+	if (WARN_ONCE(atomic_long_read(&data->count) <= 0,
+		      "percpu ref (%ps) <= 0 (%ld) after switching to atomic",
+		      data->release, atomic_long_read(&data->count)) &&
+	    atomic_inc_return(&underflows) < 4) {
+		pr_err("%s(): percpu_ref underflow", __func__);
+		mem_dump_obj(data);
+	}
 
 	/* @ref is viewed as dead on all CPUs, send out switch confirmation */
 	percpu_ref_call_confirm_rcu(rcu);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
@ 2020-12-09  8:17     ` Christoph Hellwig
  2020-12-09 14:57       ` Paul E. McKenney
  2020-12-09 17:28     ` Vlastimil Babka
  2020-12-10 12:04     ` Joonsoo Kim
  2 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2020-12-09  8:17 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

Your two new exports don't actually seem to get used in modular code
at all in this series.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09  8:17     ` Christoph Hellwig
@ 2020-12-09 14:57       ` Paul E. McKenney
  2020-12-09 17:53         ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09 14:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> Your two new exports don't actually seem to get used in modular code
> at all in this series.

Indeed, and I either need to remove the exports or make my test code in
kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
mm/slab selftest location.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
  2020-12-09  8:17     ` Christoph Hellwig
@ 2020-12-09 17:28     ` Vlastimil Babka
  2020-12-09 23:04       ` Paul E. McKenney
  2020-12-10 12:04     ` Joonsoo Kim
  2 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2020-12-09 17:28 UTC (permalink / raw)
  To: paulmck, rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On 12/9/20 2:12 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.

Sounds useful, yeah. It occured to me at least once that we don't have a nice
generic way to print this kind of info. I usually dig it from a crash dump...

> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

...

> +/**
> + * kmem_valid_obj - does the pointer reference a valid slab object?
> + * @object: pointer to query.
> + *
> + * Return: %true if the pointer is to a not-yet-freed object from
> + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> + * is to an already-freed object, and %false otherwise.
> + */

It should be possible to find out more about object being free or not, than you
currently do. At least to find out if it's definitely free. When it appears
allocated, it can be actually still free in some kind of e.g. per-cpu or
per-node cache that would be infeasible to check. But that improvement to the
output can be also added later. Also SLUB stores the freeing stacktrace, which
might be useful...

> +bool kmem_valid_obj(void *object)
> +{
> +	struct page *page;
> +
> +	if (!virt_addr_valid(object))
> +		return false;
> +	page = virt_to_head_page(object);
> +	return PageSlab(page);
> +}
> +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> +
> +/**
> + * kmem_dump_obj - Print available slab provenance information
> + * @object: slab object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For a slab-cache object, the fact that it is a slab object is printed,
> + * and, if available, the slab name, return address, and stack trace from
> + * the allocation of that object.
> + *
> + * This function will splat if passed a pointer to a non-slab object.
> + * If you are not sure what type of object you have, you should instead
> + * use mem_dump_obj().
> + */
> +void kmem_dump_obj(void *object)
> +{
> +	int i;
> +	struct page *page;
> +	struct kmem_provenance kp;
> +
> +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> +		return;
> +	page = virt_to_head_page(object);
> +	if (WARN_ON_ONCE(!PageSlab(page))) {
> +		pr_cont(" non-slab memory.\n");
> +		return;
> +	}
> +	kp.kp_ptr = object;
> +	kp.kp_page = page;
> +	kp.kp_nstack = KS_ADDRS_COUNT;
> +	kmem_provenance(&kp);

You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
would make sense in this patch already).

> +	if (page->slab_cache)
> +		pr_cont(" slab %s", page->slab_cache->name);
> +	else
> +		pr_cont(" slab ");
> +	if (kp.kp_ret)
> +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> +	else
> +		pr_cont("\n");
> +	if (kp.kp_stack[0]) {
> +		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> +			if (!kp.kp_stack[i])
> +				break;
> +			pr_info("    %pS\n", kp.kp_stack[i]);
> +		}
> +	}
> +}

...

> diff --git a/mm/slub.c b/mm/slub.c
> index b30be23..027fe0f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	return 0;
>  }
>  
> +void kmem_provenance(struct kmem_provenance *kpp)
> +{
> +#ifdef CONFIG_SLUB_DEBUG

I'd expect at least the very basic stuff (kp_obj) to be possible to determine
even under !CONFIG_SLUB_DEBUG?

> +	void *base;
> +	int i;
> +	void *object = kpp->kp_ptr;
> +	unsigned int objnr;
> +	void *objp;
> +	struct page *page = kpp->kp_page;
> +	struct kmem_cache *s = page->slab_cache;
> +	struct track *trackp;
> +
> +	base = page_address(page);
> +	objp = kasan_reset_tag(object);
> +	objp = restore_red_left(s, objp);
> +	objnr = obj_to_index(s, page, objp);
> +	objp = base + s->size * objnr;
> +	kpp->kp_objp = objp;
> +	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
> +	    !(s->flags & SLAB_STORE_USER))
> +		goto nodebug;
> +	trackp = get_track(s, objp, TRACK_ALLOC);
> +	kpp->kp_ret = (void *)trackp->addr;
> +#ifdef CONFIG_STACKTRACE
> +	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
> +		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> +		if (!kpp->kp_stack[i])
> +			break;
> +	}
> +#endif
> +	if (kpp->kp_stack && i < kpp->kp_nstack)
> +		kpp->kp_stack[i] = NULL;
> +	return;
> +nodebug:
> +#endif
> +	kpp->kp_ret = NULL;
> +	if (kpp->kp_nstack)
> +		kpp->kp_stack[0] = NULL;
> +}
> +
>  /********************************************************************
>   *		Kmalloc subsystem
>   *******************************************************************/
> diff --git a/mm/util.c b/mm/util.c
> index 4ddb6e1..d0e60d2 100644
> --- a/mm/util.c
> +++ b/mm/util.c

I think mm/debug.c is a better fit as it already has dump_page() of a similar
nature. Also you can call that from from mem_dump_obj() at least in case when
the more specific handlers fail. It will even include page_owner info if enabled! :)

Thanks,
Vlastimil

> @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>  	kunmap_atomic(addr1);
>  	return ret;
>  }
> +
> +/**
> + * mem_dump_obj - Print available provenance information
> + * @object: object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For example, for a slab-cache object, the slab name is printed, and,
> + * if available, the return address and stack trace from the allocation
> + * of that object.
> + */
> +void mem_dump_obj(void *object)
> +{
> +	if (!virt_addr_valid(object)) {
> +		pr_cont(" non-paged (local) memory.\n");
> +		return;
> +	}
> +	if (kmem_valid_obj(object)) {
> +		kmem_dump_obj(object);
> +		return;
> +	}
> +	pr_cont(" non-slab memory.\n");
> +}
> +EXPORT_SYMBOL_GPL(mem_dump_obj);
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  2020-12-09  1:13   ` [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
@ 2020-12-09 17:48     ` Vlastimil Babka
  2020-12-10  3:25       ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2020-12-09 17:48 UTC (permalink / raw)
  To: paulmck, rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> This commit makes mem_dump_obj() call out NULL and zero-sized pointers
> specially instead of classifying them as non-paged memory.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/util.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index d0e60d2..8c2449f 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -985,7 +985,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>  void mem_dump_obj(void *object)
>  {
>  	if (!virt_addr_valid(object)) {
> -		pr_cont(" non-paged (local) memory.\n");
> +		if (object == NULL)
> +			pr_cont(" NULL pointer.\n");
> +		else if (object == ZERO_SIZE_PTR)
> +			pr_cont(" zero-size pointer.\n");
> +		else
> +			pr_cont(" non-paged (local) memory.\n");
>  		return;
>  	}
>  	if (kmem_valid_obj(object)) {
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09  1:13   ` [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
@ 2020-12-09 17:51     ` Vlastimil Babka
  2020-12-09 19:39       ` Uladzislau Rezki
  2020-12-09 23:23       ` Paul E. McKenney
  2020-12-09 19:36     ` Uladzislau Rezki
  1 sibling, 2 replies; 49+ messages in thread
From: Vlastimil Babka @ 2020-12-09 17:51 UTC (permalink / raw)
  To: paulmck, rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> This commit adds vmalloc() support to mem_dump_obj().  Note that the
> vmalloc_dump_obj() function combines the checking and dumping, in
> contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> The reason for the difference is that the checking in the vmalloc()
> case involves acquiring a global lock, and redundant acquisitions of
> global locks should be avoided, even on not-so-fast paths.
> 
> Note that this change causes on-stack variables to be reported as
> vmalloc() storage from kernel_clone() or similar, depending on the degree
> of inlining that your compiler does.  This is likely more helpful than
> the earlier "non-paged (local) memory".
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

...

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  }
>  #endif	/* CONFIG_SMP */
>  
> +bool vmalloc_dump_obj(void *object)
> +{
> +	struct vm_struct *vm;
> +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> +
> +	vm = find_vm_area(objp);
> +	if (!vm)
> +		return false;
> +	pr_cont(" vmalloc allocated at %pS\n", vm->caller);

Would it be useful to print the vm area boundaries too?

> +	return true;
> +}
> +
>  #ifdef CONFIG_PROC_FS
>  static void *s_start(struct seq_file *m, loff_t *pos)
>  	__acquires(&vmap_purge_lock)
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09 14:57       ` Paul E. McKenney
@ 2020-12-09 17:53         ` Christoph Hellwig
  2020-12-09 17:59           ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2020-12-09 17:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Hellwig, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, akpm, mathieu.desnoyers, josh, tglx, peterz,
	rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	iamjoonsoo.kim, andrii, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

On Wed, Dec 09, 2020 at 06:57:02AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> > Your two new exports don't actually seem to get used in modular code
> > at all in this series.
> 
> Indeed, and I either need to remove the exports or make my test code in
> kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
> mm/slab selftest location.

I'd rather not export something like this which pokes deep into
internals.  That being said I've been working on off on a
EXPORT_SYMBOL_FOR() that just exports a symbol to one specific module.
Hopefully I'll finish it for the next merge window, and with that
I'd feel much more comfortable with an export.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09 17:53         ` Christoph Hellwig
@ 2020-12-09 17:59           ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09 17:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On Wed, Dec 09, 2020 at 05:53:06PM +0000, Christoph Hellwig wrote:
> On Wed, Dec 09, 2020 at 06:57:02AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> > > Your two new exports don't actually seem to get used in modular code
> > > at all in this series.
> > 
> > Indeed, and I either need to remove the exports or make my test code in
> > kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
> > mm/slab selftest location.
> 
> I'd rather not export something like this which pokes deep into
> internals.  That being said I've been working on off on a
> EXPORT_SYMBOL_FOR() that just exports a symbol to one specific module.
> Hopefully I'll finish it for the next merge window, and with that
> I'd feel much more comfortable with an export.

That would be really useful!  I have a number of symbols that should
only be used by a few specific in-tree modules, independent of this
patch series.

For my part, I will see if there is a good mm-related location for this
sort of selftest.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09  1:13   ` [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
  2020-12-09 17:51     ` Vlastimil Babka
@ 2020-12-09 19:36     ` Uladzislau Rezki
  2020-12-09 19:42       ` Paul E. McKenney
  1 sibling, 1 reply; 49+ messages in thread
From: Uladzislau Rezki @ 2020-12-09 19:36 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On Tue, Dec 08, 2020 at 05:13:01PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> This commit adds vmalloc() support to mem_dump_obj().  Note that the
> vmalloc_dump_obj() function combines the checking and dumping, in
> contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> The reason for the difference is that the checking in the vmalloc()
> case involves acquiring a global lock, and redundant acquisitions of
> global locks should be avoided, even on not-so-fast paths.
> 
> Note that this change causes on-stack variables to be reported as
> vmalloc() storage from kernel_clone() or similar, depending on the degree
> of inlining that your compiler does.  This is likely more helpful than
> the earlier "non-paged (local) memory".
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  include/linux/vmalloc.h |  6 ++++++
>  mm/util.c               | 12 +++++++-----
>  mm/vmalloc.c            | 12 ++++++++++++
>  3 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 938eaf9..c89c2be 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  int register_vmap_purge_notifier(struct notifier_block *nb);
>  int unregister_vmap_purge_notifier(struct notifier_block *nb);
>  
> +#ifdef CONFIG_MMU
> +bool vmalloc_dump_obj(void *object);
> +#else
> +static inline bool vmalloc_dump_obj(void *object) { return false; }
> +#endif
> +
>  #endif /* _LINUX_VMALLOC_H */
> diff --git a/mm/util.c b/mm/util.c
> index 8c2449f..ee99a0a 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>   */
>  void mem_dump_obj(void *object)
>  {
> +	if (kmem_valid_obj(object)) {
> +		kmem_dump_obj(object);
> +		return;
> +	}
> +	if (vmalloc_dump_obj(object))
> +		return;
>  	if (!virt_addr_valid(object)) {
>  		if (object == NULL)
>  			pr_cont(" NULL pointer.\n");
> @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
>  			pr_cont(" non-paged (local) memory.\n");
>  		return;
>  	}
> -	if (kmem_valid_obj(object)) {
> -		kmem_dump_obj(object);
> -		return;
> -	}
> -	pr_cont(" non-slab memory.\n");
> +	pr_cont(" non-slab/vmalloc memory.\n");
>  }
>  EXPORT_SYMBOL_GPL(mem_dump_obj);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a..7421719 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  }
>  #endif	/* CONFIG_SMP */
>  
> +bool vmalloc_dump_obj(void *object)
> +{
> +	struct vm_struct *vm;
> +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>
Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
is odd.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09 17:51     ` Vlastimil Babka
@ 2020-12-09 19:39       ` Uladzislau Rezki
  2020-12-09 23:23       ` Paul E. McKenney
  1 sibling, 0 replies; 49+ messages in thread
From: Uladzislau Rezki @ 2020-12-09 19:39 UTC (permalink / raw)
  To: Vlastimil Babka, paulmck
  Cc: paulmck, rcu, linux-kernel, kernel-team, mingo, jiangshanlai,
	akpm, mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On Wed, Dec 09, 2020 at 06:51:20PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > vmalloc_dump_obj() function combines the checking and dumping, in
> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > The reason for the difference is that the checking in the vmalloc()
> > case involves acquiring a global lock, and redundant acquisitions of
> > global locks should be avoided, even on not-so-fast paths.
> > 
> > Note that this change causes on-stack variables to be reported as
> > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > of inlining that your compiler does.  This is likely more helpful than
> > the earlier "non-paged (local) memory".
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> ...
> 
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  }
> >  #endif	/* CONFIG_SMP */
> >  
> > +bool vmalloc_dump_obj(void *object)
> > +{
> > +	struct vm_struct *vm;
> > +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > +
> > +	vm = find_vm_area(objp);
> > +	if (!vm)
> > +		return false;
> > +	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
> 
> Would it be useful to print the vm area boundaries too?
> 
Do you mean va_start/va_end information?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09 19:36     ` Uladzislau Rezki
@ 2020-12-09 19:42       ` Paul E. McKenney
  2020-12-09 20:04         ` Uladzislau Rezki
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09 19:42 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On Wed, Dec 09, 2020 at 08:36:37PM +0100, Uladzislau Rezki wrote:
> On Tue, Dec 08, 2020 at 05:13:01PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > vmalloc_dump_obj() function combines the checking and dumping, in
> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > The reason for the difference is that the checking in the vmalloc()
> > case involves acquiring a global lock, and redundant acquisitions of
> > global locks should be avoided, even on not-so-fast paths.
> > 
> > Note that this change causes on-stack variables to be reported as
> > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > of inlining that your compiler does.  This is likely more helpful than
> > the earlier "non-paged (local) memory".
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  include/linux/vmalloc.h |  6 ++++++
> >  mm/util.c               | 12 +++++++-----
> >  mm/vmalloc.c            | 12 ++++++++++++
> >  3 files changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 938eaf9..c89c2be 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  int register_vmap_purge_notifier(struct notifier_block *nb);
> >  int unregister_vmap_purge_notifier(struct notifier_block *nb);
> >  
> > +#ifdef CONFIG_MMU
> > +bool vmalloc_dump_obj(void *object);
> > +#else
> > +static inline bool vmalloc_dump_obj(void *object) { return false; }
> > +#endif
> > +
> >  #endif /* _LINUX_VMALLOC_H */
> > diff --git a/mm/util.c b/mm/util.c
> > index 8c2449f..ee99a0a 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >   */
> >  void mem_dump_obj(void *object)
> >  {
> > +	if (kmem_valid_obj(object)) {
> > +		kmem_dump_obj(object);
> > +		return;
> > +	}
> > +	if (vmalloc_dump_obj(object))
> > +		return;
> >  	if (!virt_addr_valid(object)) {
> >  		if (object == NULL)
> >  			pr_cont(" NULL pointer.\n");
> > @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
> >  			pr_cont(" non-paged (local) memory.\n");
> >  		return;
> >  	}
> > -	if (kmem_valid_obj(object)) {
> > -		kmem_dump_obj(object);
> > -		return;
> > -	}
> > -	pr_cont(" non-slab memory.\n");
> > +	pr_cont(" non-slab/vmalloc memory.\n");
> >  }
> >  EXPORT_SYMBOL_GPL(mem_dump_obj);
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 6ae491a..7421719 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  }
> >  #endif	/* CONFIG_SMP */
> >  
> > +bool vmalloc_dump_obj(void *object)
> > +{
> > +	struct vm_struct *vm;
> > +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> >
> Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
> is odd.

They are, but this is to handle things like this:

	struct foo {
		int a;
		struct rcu_head rh;
	};

	void silly(struct foo *fp)
	{
		call_rcu(&fp->rh, my_rcu_cb);
		call_rcu(&fp->rh, my_other_rcu_cb);
	}

In kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, this would
result in a call to mem_dump_obj() and then to vmalloc_dump_obj()
with a non-page-aligned pointer.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09 19:42       ` Paul E. McKenney
@ 2020-12-09 20:04         ` Uladzislau Rezki
  0 siblings, 0 replies; 49+ messages in thread
From: Uladzislau Rezki @ 2020-12-09 20:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, akpm, mathieu.desnoyers, josh, tglx, peterz,
	rostedt, dhowells, edumazet, fweisbec, oleg, joel,
	iamjoonsoo.kim, andrii, linux-mm

On Wed, Dec 09, 2020 at 11:42:39AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 09, 2020 at 08:36:37PM +0100, Uladzislau Rezki wrote:
> > On Tue, Dec 08, 2020 at 05:13:01PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > > vmalloc_dump_obj() function combines the checking and dumping, in
> > > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > > The reason for the difference is that the checking in the vmalloc()
> > > case involves acquiring a global lock, and redundant acquisitions of
> > > global locks should be avoided, even on not-so-fast paths.
> > > 
> > > Note that this change causes on-stack variables to be reported as
> > > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > > of inlining that your compiler does.  This is likely more helpful than
> > > the earlier "non-paged (local) memory".
> > > 
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > Cc: <linux-mm@kvack.org>
> > > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  include/linux/vmalloc.h |  6 ++++++
> > >  mm/util.c               | 12 +++++++-----
> > >  mm/vmalloc.c            | 12 ++++++++++++
> > >  3 files changed, 25 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > > index 938eaf9..c89c2be 100644
> > > --- a/include/linux/vmalloc.h
> > > +++ b/include/linux/vmalloc.h
> > > @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > >  int register_vmap_purge_notifier(struct notifier_block *nb);
> > >  int unregister_vmap_purge_notifier(struct notifier_block *nb);
> > >  
> > > +#ifdef CONFIG_MMU
> > > +bool vmalloc_dump_obj(void *object);
> > > +#else
> > > +static inline bool vmalloc_dump_obj(void *object) { return false; }
> > > +#endif
> > > +
> > >  #endif /* _LINUX_VMALLOC_H */
> > > diff --git a/mm/util.c b/mm/util.c
> > > index 8c2449f..ee99a0a 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> > >   */
> > >  void mem_dump_obj(void *object)
> > >  {
> > > +	if (kmem_valid_obj(object)) {
> > > +		kmem_dump_obj(object);
> > > +		return;
> > > +	}
> > > +	if (vmalloc_dump_obj(object))
> > > +		return;
> > >  	if (!virt_addr_valid(object)) {
> > >  		if (object == NULL)
> > >  			pr_cont(" NULL pointer.\n");
> > > @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
> > >  			pr_cont(" non-paged (local) memory.\n");
> > >  		return;
> > >  	}
> > > -	if (kmem_valid_obj(object)) {
> > > -		kmem_dump_obj(object);
> > > -		return;
> > > -	}
> > > -	pr_cont(" non-slab memory.\n");
> > > +	pr_cont(" non-slab/vmalloc memory.\n");
> > >  }
> > >  EXPORT_SYMBOL_GPL(mem_dump_obj);
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 6ae491a..7421719 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > >  }
> > >  #endif	/* CONFIG_SMP */
> > >  
> > > +bool vmalloc_dump_obj(void *object)
> > > +{
> > > +	struct vm_struct *vm;
> > > +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > >
> > Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
> > is odd.
> 
> They are, but this is to handle things like this:
> 
> 	struct foo {
> 		int a;
> 		struct rcu_head rh;
> 	};
> 
> 	void silly(struct foo *fp)
> 	{
> 		call_rcu(&fp->rh, my_rcu_cb);
> 		call_rcu(&fp->rh, my_other_rcu_cb);
> 	}
> 
> In kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, this would
> result in a call to mem_dump_obj() and then to vmalloc_dump_obj()
> with a non-page-aligned pointer.
> 
OK, i got it. I thought the functions deals with original vmalloc
pointer. In fact it is not :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09 17:28     ` Vlastimil Babka
@ 2020-12-09 23:04       ` Paul E. McKenney
  2020-12-10 10:48         ` Vlastimil Babka
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09 23:04 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On Wed, Dec 09, 2020 at 06:28:50PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:12 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> 
> Sounds useful, yeah. It occured to me at least once that we don't have a nice
> generic way to print this kind of info. I usually dig it from a crash dump...

Glad to hear that it might be helpful, and thank you for looking this
over!

> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> ...
> 
> > +/**
> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > + * @object: pointer to query.
> > + *
> > + * Return: %true if the pointer is to a not-yet-freed object from
> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > + * is to an already-freed object, and %false otherwise.
> > + */
> 
> It should be possible to find out more about object being free or not, than you
> currently do. At least to find out if it's definitely free. When it appears
> allocated, it can be actually still free in some kind of e.g. per-cpu or
> per-node cache that would be infeasible to check. But that improvement to the
> output can be also added later. Also SLUB stores the freeing stacktrace, which
> might be useful...

I can see how this could help debugging a use-after-free situation,
at least as long as the poor sap that subsequently allocated it doesn't
free it.

I can easily add more fields to the kmem_provenance structure.  Maybe
it would make sense to have another exported API that you provide a
kmem_provenance structure to, and it fills it in.

One caution though...  I rely on the object being allocated.
If it officially might already be freed, complex and high-overhead
synchronization seems to be required to safely access the various data
structures.

So any use on an already-freed object is on a "you break it you get to
keep the pieces" basis.  On the other hand, if you are dealing with a
use-after-free situation, life is hard anyway.

Or am I missing your point?

> > +bool kmem_valid_obj(void *object)
> > +{
> > +	struct page *page;
> > +
> > +	if (!virt_addr_valid(object))
> > +		return false;
> > +	page = virt_to_head_page(object);
> > +	return PageSlab(page);
> > +}
> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> > +
> > +/**
> > + * kmem_dump_obj - Print available slab provenance information
> > + * @object: slab object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For a slab-cache object, the fact that it is a slab object is printed,
> > + * and, if available, the slab name, return address, and stack trace from
> > + * the allocation of that object.
> > + *
> > + * This function will splat if passed a pointer to a non-slab object.
> > + * If you are not sure what type of object you have, you should instead
> > + * use mem_dump_obj().
> > + */
> > +void kmem_dump_obj(void *object)
> > +{
> > +	int i;
> > +	struct page *page;
> > +	struct kmem_provenance kp;
> > +
> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > +		return;
> > +	page = virt_to_head_page(object);
> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > +		pr_cont(" non-slab memory.\n");
> > +		return;
> > +	}
> > +	kp.kp_ptr = object;
> > +	kp.kp_page = page;
> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> > +	kmem_provenance(&kp);
> 
> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
> would make sense in this patch already).

Good point!

However, please note that the various debugging options that reserve
space at the beginning.  This can make the meaning of kp.kp_objp a bit
different than one might expect.

> > +	if (page->slab_cache)
> > +		pr_cont(" slab %s", page->slab_cache->name);
> > +	else
> > +		pr_cont(" slab ");
> > +	if (kp.kp_ret)
> > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > +	else
> > +		pr_cont("\n");
> > +	if (kp.kp_stack[0]) {
> > +		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> > +			if (!kp.kp_stack[i])
> > +				break;
> > +			pr_info("    %pS\n", kp.kp_stack[i]);
> > +		}
> > +	}
> > +}
> 
> ...
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b30be23..027fe0f 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >  	return 0;
> >  }
> >  
> > +void kmem_provenance(struct kmem_provenance *kpp)
> > +{
> > +#ifdef CONFIG_SLUB_DEBUG
> 
> I'd expect at least the very basic stuff (kp_obj) to be possible to determine
> even under !CONFIG_SLUB_DEBUG?

And doing it that way even saves a line of code!  ;-)

> > +	void *base;
> > +	int i;
> > +	void *object = kpp->kp_ptr;
> > +	unsigned int objnr;
> > +	void *objp;
> > +	struct page *page = kpp->kp_page;
> > +	struct kmem_cache *s = page->slab_cache;
> > +	struct track *trackp;
> > +
> > +	base = page_address(page);
> > +	objp = kasan_reset_tag(object);
> > +	objp = restore_red_left(s, objp);
> > +	objnr = obj_to_index(s, page, objp);
> > +	objp = base + s->size * objnr;
> > +	kpp->kp_objp = objp;
> > +	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
> > +	    !(s->flags & SLAB_STORE_USER))
> > +		goto nodebug;
> > +	trackp = get_track(s, objp, TRACK_ALLOC);
> > +	kpp->kp_ret = (void *)trackp->addr;
> > +#ifdef CONFIG_STACKTRACE
> > +	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
> > +		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> > +		if (!kpp->kp_stack[i])
> > +			break;
> > +	}
> > +#endif
> > +	if (kpp->kp_stack && i < kpp->kp_nstack)
> > +		kpp->kp_stack[i] = NULL;
> > +	return;
> > +nodebug:
> > +#endif
> > +	kpp->kp_ret = NULL;
> > +	if (kpp->kp_nstack)
> > +		kpp->kp_stack[0] = NULL;
> > +}
> > +
> >  /********************************************************************
> >   *		Kmalloc subsystem
> >   *******************************************************************/
> > diff --git a/mm/util.c b/mm/util.c
> > index 4ddb6e1..d0e60d2 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> 
> I think mm/debug.c is a better fit as it already has dump_page() of a similar
> nature. Also you can call that from from mem_dump_obj() at least in case when
> the more specific handlers fail. It will even include page_owner info if enabled! :)

I will count this as one vote for mm/debug.c.

Two things to consider, though...  First, Joonsoo suggests that the fact
that this produces useful information without any debugging information
enabled makes it not be debugging as such.  Second, mm/debug.c does
not include either slab.h or vmalloc.h.  The second might not be a
showstopper, but I was interpreting this to mean that its role was
less central.

							Thanx, Paul

> Thanks,
> Vlastimil
> 
> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >  	kunmap_atomic(addr1);
> >  	return ret;
> >  }
> > +
> > +/**
> > + * mem_dump_obj - Print available provenance information
> > + * @object: object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For example, for a slab-cache object, the slab name is printed, and,
> > + * if available, the return address and stack trace from the allocation
> > + * of that object.
> > + */
> > +void mem_dump_obj(void *object)
> > +{
> > +	if (!virt_addr_valid(object)) {
> > +		pr_cont(" non-paged (local) memory.\n");
> > +		return;
> > +	}
> > +	if (kmem_valid_obj(object)) {
> > +		kmem_dump_obj(object);
> > +		return;
> > +	}
> > +	pr_cont(" non-slab memory.\n");
> > +}
> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
> > 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09 17:51     ` Vlastimil Babka
  2020-12-09 19:39       ` Uladzislau Rezki
@ 2020-12-09 23:23       ` Paul E. McKenney
  2020-12-10 10:49         ` Vlastimil Babka
  1 sibling, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-09 23:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On Wed, Dec 09, 2020 at 06:51:20PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > vmalloc_dump_obj() function combines the checking and dumping, in
> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > The reason for the difference is that the checking in the vmalloc()
> > case involves acquiring a global lock, and redundant acquisitions of
> > global locks should be avoided, even on not-so-fast paths.
> > 
> > Note that this change causes on-stack variables to be reported as
> > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > of inlining that your compiler does.  This is likely more helpful than
> > the earlier "non-paged (local) memory".
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> ...
> 
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  }
> >  #endif	/* CONFIG_SMP */
> >  
> > +bool vmalloc_dump_obj(void *object)
> > +{
> > +	struct vm_struct *vm;
> > +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > +
> > +	vm = find_vm_area(objp);
> > +	if (!vm)
> > +		return false;
> > +	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
> 
> Would it be useful to print the vm area boundaries too?

Like this?

I also considered instead using vm->size, but that always seems to include
an extra page, so a 4-page span is listed as having 20480 bytes and a
one-page span is 8192 bytes.  This might be more accurate in some sense,
but would be quite confusing to someone trying to compare this size with
that requested in the vmalloc() call.

							Thanx, Paul

------------------------------------------------------------------------

commit 33e0469c289c2f78e5f0d0c463c8ee3357d273c0
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed Dec 9 15:15:27 2020 -0800

    mm: Make mem_obj_dump() vmalloc() dumps include start and length
    
    This commit adds the starting address and number of pages to the vmalloc()
    information dumped by way of vmalloc_dump_obj().
    
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: <linux-mm@kvack.org>
    Reported-by: Andrii Nakryiko <andrii@kernel.org>
    Suggested-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7421719..77b1100 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3439,7 +3439,8 @@ bool vmalloc_dump_obj(void *object)
 	vm = find_vm_area(objp);
 	if (!vm)
 		return false;
-	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
+	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
+		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  2020-12-09 17:48     ` Vlastimil Babka
@ 2020-12-10  3:25       ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-10  3:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On Wed, Dec 09, 2020 at 06:48:47PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > This commit makes mem_dump_obj() call out NULL and zero-sized pointers
> > specially instead of classifying them as non-paged memory.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Applied, thank you!

						Thanx, Paul

> > ---
> >  mm/util.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/util.c b/mm/util.c
> > index d0e60d2..8c2449f 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -985,7 +985,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >  void mem_dump_obj(void *object)
> >  {
> >  	if (!virt_addr_valid(object)) {
> > -		pr_cont(" non-paged (local) memory.\n");
> > +		if (object == NULL)
> > +			pr_cont(" NULL pointer.\n");
> > +		else if (object == ZERO_SIZE_PTR)
> > +			pr_cont(" zero-size pointer.\n");
> > +		else
> > +			pr_cont(" non-paged (local) memory.\n");
> >  		return;
> >  	}
> >  	if (kmem_valid_obj(object)) {
> > 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09 23:04       ` Paul E. McKenney
@ 2020-12-10 10:48         ` Vlastimil Babka
  2020-12-10 19:56           ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2020-12-10 10:48 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On 12/10/20 12:04 AM, Paul E. McKenney wrote:
>> > +/**
>> > + * kmem_valid_obj - does the pointer reference a valid slab object?
>> > + * @object: pointer to query.
>> > + *
>> > + * Return: %true if the pointer is to a not-yet-freed object from
>> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
>> > + * is to an already-freed object, and %false otherwise.
>> > + */
>> 
>> It should be possible to find out more about object being free or not, than you
>> currently do. At least to find out if it's definitely free. When it appears
>> allocated, it can be actually still free in some kind of e.g. per-cpu or
>> per-node cache that would be infeasible to check. But that improvement to the
>> output can be also added later. Also SLUB stores the freeing stacktrace, which
>> might be useful...
> 
> I can see how this could help debugging a use-after-free situation,
> at least as long as the poor sap that subsequently allocated it doesn't
> free it.
> 
> I can easily add more fields to the kmem_provenance structure.  Maybe
> it would make sense to have another exported API that you provide a
> kmem_provenance structure to, and it fills it in.
> 
> One caution though...  I rely on the object being allocated.
> If it officially might already be freed, complex and high-overhead
> synchronization seems to be required to safely access the various data
> structures.

Good point! It's easy to forget that when being used to similar digging in a
crash dump, where nothing changes.

> So any use on an already-freed object is on a "you break it you get to
> keep the pieces" basis.  On the other hand, if you are dealing with a
> use-after-free situation, life is hard anyway.

Yeah, even now I think it's potentially dangerous, as you can get
kmem_valid_obj() as true because PageSlab(page) is true. But the object might be
already free, so as soon as another CPU frees another object from the same slab
page, the page gets also freed... or it was already freed and then allocated by
another slab so it's PageSlab() again.
I guess at least some safety could be achieved by pinning the page with
get_page_unless_zero. But maybe your current implementation is already safe,
need to check in detail.

> Or am I missing your point?
> 
>> > +bool kmem_valid_obj(void *object)
>> > +{
>> > +	struct page *page;
>> > +
>> > +	if (!virt_addr_valid(object))
>> > +		return false;
>> > +	page = virt_to_head_page(object);
>> > +	return PageSlab(page);
>> > +}
>> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
>> > +
>> > +/**
>> > + * kmem_dump_obj - Print available slab provenance information
>> > + * @object: slab object for which to find provenance information.
>> > + *
>> > + * This function uses pr_cont(), so that the caller is expected to have
>> > + * printed out whatever preamble is appropriate.  The provenance information
>> > + * depends on the type of object and on how much debugging is enabled.
>> > + * For a slab-cache object, the fact that it is a slab object is printed,
>> > + * and, if available, the slab name, return address, and stack trace from
>> > + * the allocation of that object.
>> > + *
>> > + * This function will splat if passed a pointer to a non-slab object.
>> > + * If you are not sure what type of object you have, you should instead
>> > + * use mem_dump_obj().
>> > + */
>> > +void kmem_dump_obj(void *object)
>> > +{
>> > +	int i;
>> > +	struct page *page;
>> > +	struct kmem_provenance kp;
>> > +
>> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
>> > +		return;
>> > +	page = virt_to_head_page(object);
>> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
>> > +		pr_cont(" non-slab memory.\n");
>> > +		return;
>> > +	}
>> > +	kp.kp_ptr = object;
>> > +	kp.kp_page = page;
>> > +	kp.kp_nstack = KS_ADDRS_COUNT;
>> > +	kmem_provenance(&kp);
>> 
>> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
>> would make sense in this patch already).
> 
> Good point!
> 
> However, please note that the various debugging options that reserve
> space at the beginning.  This can make the meaning of kp.kp_objp a bit
> different than one might expect.

Yeah, I think the best would be to match the address that
kmalloc/kmem_cache_alloc() would return, thus the beginning of the object
itself, so you can calculate the offset within it, etc.

>> > --- a/mm/util.c
>> > +++ b/mm/util.c
>> 
>> I think mm/debug.c is a better fit as it already has dump_page() of a similar
>> nature. Also you can call that from from mem_dump_obj() at least in case when
>> the more specific handlers fail. It will even include page_owner info if enabled! :)
> 
> I will count this as one vote for mm/debug.c.
> 
> Two things to consider, though...  First, Joonsoo suggests that the fact
> that this produces useful information without any debugging information
> enabled makes it not be debugging as such.

Well there's already dump_page() which also produces information without special
configs.
We're not the best subsystem in this kind of consistency...

> Second, mm/debug.c does
> not include either slab.h or vmalloc.h.  The second might not be a
> showstopper, but I was interpreting this to mean that its role was
> less central.

I think it can include whatever becomes needed there :)

> 							Thanx, Paul
> 
>> Thanks,
>> Vlastimil
>> 
>> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>> >  	kunmap_atomic(addr1);
>> >  	return ret;
>> >  }
>> > +
>> > +/**
>> > + * mem_dump_obj - Print available provenance information
>> > + * @object: object for which to find provenance information.
>> > + *
>> > + * This function uses pr_cont(), so that the caller is expected to have
>> > + * printed out whatever preamble is appropriate.  The provenance information
>> > + * depends on the type of object and on how much debugging is enabled.
>> > + * For example, for a slab-cache object, the slab name is printed, and,
>> > + * if available, the return address and stack trace from the allocation
>> > + * of that object.
>> > + */
>> > +void mem_dump_obj(void *object)
>> > +{
>> > +	if (!virt_addr_valid(object)) {
>> > +		pr_cont(" non-paged (local) memory.\n");
>> > +		return;
>> > +	}
>> > +	if (kmem_valid_obj(object)) {
>> > +		kmem_dump_obj(object);
>> > +		return;
>> > +	}
>> > +	pr_cont(" non-slab memory.\n");
>> > +}
>> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
>> > 
>> 
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-09 23:23       ` Paul E. McKenney
@ 2020-12-10 10:49         ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2020-12-10 10:49 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii, linux-mm

On 12/10/20 12:23 AM, Paul E. McKenney wrote:
> On Wed, Dec 09, 2020 at 06:51:20PM +0100, Vlastimil Babka wrote:
>> On 12/9/20 2:13 AM, paulmck@kernel.org wrote:
>> > From: "Paul E. McKenney" <paulmck@kernel.org>
>> > 
>> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
>> > vmalloc_dump_obj() function combines the checking and dumping, in
>> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
>> > The reason for the difference is that the checking in the vmalloc()
>> > case involves acquiring a global lock, and redundant acquisitions of
>> > global locks should be avoided, even on not-so-fast paths.
>> > 
>> > Note that this change causes on-stack variables to be reported as
>> > vmalloc() storage from kernel_clone() or similar, depending on the degree
>> > of inlining that your compiler does.  This is likely more helpful than
>> > the earlier "non-paged (local) memory".
>> > 
>> > Cc: Andrew Morton <akpm@linux-foundation.org>
>> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> > Cc: <linux-mm@kvack.org>
>> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
>> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>> 
>> ...
>> 
>> > --- a/mm/vmalloc.c
>> > +++ b/mm/vmalloc.c
>> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>> >  }
>> >  #endif	/* CONFIG_SMP */
>> >  
>> > +bool vmalloc_dump_obj(void *object)
>> > +{
>> > +	struct vm_struct *vm;
>> > +	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>> > +
>> > +	vm = find_vm_area(objp);
>> > +	if (!vm)
>> > +		return false;
>> > +	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
>> 
>> Would it be useful to print the vm area boundaries too?
> 
> Like this?

Yeah, thanks!

> I also considered instead using vm->size, but that always seems to include
> an extra page, so a 4-page span is listed as having 20480 bytes and a
> one-page span is 8192 bytes.  This might be more accurate in some sense,
> but would be quite confusing to someone trying to compare this size with
> that requested in the vmalloc() call.

Right.

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 33e0469c289c2f78e5f0d0c463c8ee3357d273c0
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Wed Dec 9 15:15:27 2020 -0800
> 
>     mm: Make mem_obj_dump() vmalloc() dumps include start and length
>     
>     This commit adds the starting address and number of pages to the vmalloc()
>     information dumped by way of vmalloc_dump_obj().
>     
>     Cc: Andrew Morton <akpm@linux-foundation.org>
>     Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>     Cc: <linux-mm@kvack.org>
>     Reported-by: Andrii Nakryiko <andrii@kernel.org>
>     Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7421719..77b1100 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3439,7 +3439,8 @@ bool vmalloc_dump_obj(void *object)
>  	vm = find_vm_area(objp);
>  	if (!vm)
>  		return false;
> -	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
> +	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> +		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
>  	return true;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
  2020-12-09  8:17     ` Christoph Hellwig
  2020-12-09 17:28     ` Vlastimil Babka
@ 2020-12-10 12:04     ` Joonsoo Kim
  2020-12-10 23:41       ` Paul E. McKenney
  2 siblings, 1 reply; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-10 12:04 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Tue, Dec 08, 2020 at 05:12:59PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.
> 
> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Introducing three functions, kmem_valid_obj(), kmem_provenance(),
mem_dump_obj() looks better than patchset v1. Nice work. Few comments
below.

> ---
>  include/linux/mm.h   |  2 ++
>  include/linux/slab.h |  2 ++
>  mm/slab.c            | 28 +++++++++++++++++++++
>  mm/slab.h            | 11 +++++++++
>  mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/slob.c            |  7 ++++++
>  mm/slub.c            | 40 ++++++++++++++++++++++++++++++
>  mm/util.c            | 25 +++++++++++++++++++
>  8 files changed, 184 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe..1eea266 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  
>  extern int sysctl_nr_trim_pages;
>  
> +void mem_dump_obj(void *object);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index dd6897f..169b511 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -186,6 +186,8 @@ void kfree(const void *);
>  void kfree_sensitive(const void *);
>  size_t __ksize(const void *);
>  size_t ksize(const void *);
> +bool kmem_valid_obj(void *object);
> +void kmem_dump_obj(void *object);
>  
>  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
>  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> diff --git a/mm/slab.c b/mm/slab.c
> index b111356..72b6743 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3602,6 +3602,34 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
>  EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
>  #endif
>  
> +void kmem_provenance(struct kmem_provenance *kpp)

To open up the possibility of future enhancement, name, provenance,
looks not good to me. This function could be used to extract various
object information so such as kmem_obj_info() looks better to me. Any
thought?

> +{
> +#ifdef DEBUG
> +	struct kmem_cache *cachep;
> +	void *object = kpp->kp_ptr;
> +	unsigned int objnr;
> +	void *objp;
> +	struct page *page = kpp->kp_page;
> +
> +	cachep = page->slab_cache;
> +	if (!(cachep->flags & SLAB_STORE_USER)) {
> +		kpp->kp_ret = NULL;
> +		goto nodebug;
> +	}
> +	objp = object - obj_offset(cachep);
> +	page = virt_to_head_page(objp);
> +	objnr = obj_to_index(cachep, page, objp);
> +	objp = index_to_obj(cachep, page, objnr);
> +	kpp->kp_objp = objp;
> +	kpp->kp_ret = *dbg_userword(cachep, objp);
> +nodebug:
> +#else
> +	kpp->kp_ret = NULL;
> +#endif
> +	if (kpp->kp_nstack)
> +		kpp->kp_stack[0] = NULL;
> +}
> +
>  static __always_inline void *
>  __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>  {
> diff --git a/mm/slab.h b/mm/slab.h
> index 6d7c6a5..28a41d5 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -630,4 +630,15 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  	return false;
>  }
>  
> +#define KS_ADDRS_COUNT 16
> +struct kmem_provenance {
> +	void *kp_ptr;
> +	struct page *kp_page;
> +	void *kp_objp;
> +	void *kp_ret;
> +	void *kp_stack[KS_ADDRS_COUNT];
> +	int kp_nstack;
> +};
> +void kmem_provenance(struct kmem_provenance *kpp);
> +
>  #endif /* MM_SLAB_H */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index f9ccd5d..09f0cbc 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -536,6 +536,75 @@ bool slab_is_available(void)
>  	return slab_state >= UP;
>  }
>  
> +/**
> + * kmem_valid_obj - does the pointer reference a valid slab object?
> + * @object: pointer to query.
> + *
> + * Return: %true if the pointer is to a not-yet-freed object from
> + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> + * is to an already-freed object, and %false otherwise.
> + */
> +bool kmem_valid_obj(void *object)
> +{
> +	struct page *page;
> +
> +	if (!virt_addr_valid(object))
> +		return false;
> +	page = virt_to_head_page(object);
> +	return PageSlab(page);
> +}
> +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> +
> +/**
> + * kmem_dump_obj - Print available slab provenance information
> + * @object: slab object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For a slab-cache object, the fact that it is a slab object is printed,
> + * and, if available, the slab name, return address, and stack trace from
> + * the allocation of that object.
> + *
> + * This function will splat if passed a pointer to a non-slab object.
> + * If you are not sure what type of object you have, you should instead
> + * use mem_dump_obj().
> + */
> +void kmem_dump_obj(void *object)
> +{
> +	int i;
> +	struct page *page;
> +	struct kmem_provenance kp;
> +
> +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> +		return;
> +	page = virt_to_head_page(object);
> +	if (WARN_ON_ONCE(!PageSlab(page))) {
> +		pr_cont(" non-slab memory.\n");
> +		return;
> +	}
> +	kp.kp_ptr = object;
> +	kp.kp_page = page;
> +	kp.kp_nstack = KS_ADDRS_COUNT;

I hope that kmem_dump_obj() doesn't set any kp fields. It's the job
reserved for kmem_provenance().

> +	kmem_provenance(&kp);
> +	if (page->slab_cache)
> +		pr_cont(" slab %s", page->slab_cache->name);

Rather than accessing page->slab_cache, it's better to introduce
slab_cache field on kp and use it. Note that slob doesn't use
page->slab_cache. In slob, that field on struct page would be NULL so
it would not cause a problem. But using kp makes things clear.

> +	else
> +		pr_cont(" slab ");
> +	if (kp.kp_ret)
> +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> +	else
> +		pr_cont("\n");
> +	if (kp.kp_stack[0]) {

This check would be useless since we check it on every iteration.
 
Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-10 10:48         ` Vlastimil Babka
@ 2020-12-10 19:56           ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-10 19:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Christoph Lameter, Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 11:48:26AM +0100, Vlastimil Babka wrote:
> On 12/10/20 12:04 AM, Paul E. McKenney wrote:
> >> > +/**
> >> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> >> > + * @object: pointer to query.
> >> > + *
> >> > + * Return: %true if the pointer is to a not-yet-freed object from
> >> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> >> > + * is to an already-freed object, and %false otherwise.
> >> > + */
> >> 
> >> It should be possible to find out more about object being free or not, than you
> >> currently do. At least to find out if it's definitely free. When it appears
> >> allocated, it can be actually still free in some kind of e.g. per-cpu or
> >> per-node cache that would be infeasible to check. But that improvement to the
> >> output can be also added later. Also SLUB stores the freeing stacktrace, which
> >> might be useful...
> > 
> > I can see how this could help debugging a use-after-free situation,
> > at least as long as the poor sap that subsequently allocated it doesn't
> > free it.
> > 
> > I can easily add more fields to the kmem_provenance structure.  Maybe
> > it would make sense to have another exported API that you provide a
> > kmem_provenance structure to, and it fills it in.
> > 
> > One caution though...  I rely on the object being allocated.
> > If it officially might already be freed, complex and high-overhead
> > synchronization seems to be required to safely access the various data
> > structures.
> 
> Good point! It's easy to forget that when being used to similar digging in a
> crash dump, where nothing changes.

Maybe a similar addition to the crash-analysis tools would be helpful?

> > So any use on an already-freed object is on a "you break it you get to
> > keep the pieces" basis.  On the other hand, if you are dealing with a
> > use-after-free situation, life is hard anyway.
> 
> Yeah, even now I think it's potentially dangerous, as you can get
> kmem_valid_obj() as true because PageSlab(page) is true. But the object might be
> already free, so as soon as another CPU frees another object from the same slab
> page, the page gets also freed... or it was already freed and then allocated by
> another slab so it's PageSlab() again.
> I guess at least some safety could be achieved by pinning the page with
> get_page_unless_zero. But maybe your current implementation is already safe,
> need to check in detail.

The code on the various free paths looks to me to make the same
assumptions that I am making.  So if this is unsafe, we have other
problems.

> > Or am I missing your point?
> > 
> >> > +bool kmem_valid_obj(void *object)
> >> > +{
> >> > +	struct page *page;
> >> > +
> >> > +	if (!virt_addr_valid(object))
> >> > +		return false;
> >> > +	page = virt_to_head_page(object);
> >> > +	return PageSlab(page);
> >> > +}
> >> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> >> > +
> >> > +/**
> >> > + * kmem_dump_obj - Print available slab provenance information
> >> > + * @object: slab object for which to find provenance information.
> >> > + *
> >> > + * This function uses pr_cont(), so that the caller is expected to have
> >> > + * printed out whatever preamble is appropriate.  The provenance information
> >> > + * depends on the type of object and on how much debugging is enabled.
> >> > + * For a slab-cache object, the fact that it is a slab object is printed,
> >> > + * and, if available, the slab name, return address, and stack trace from
> >> > + * the allocation of that object.
> >> > + *
> >> > + * This function will splat if passed a pointer to a non-slab object.
> >> > + * If you are not sure what type of object you have, you should instead
> >> > + * use mem_dump_obj().
> >> > + */
> >> > +void kmem_dump_obj(void *object)
> >> > +{
> >> > +	int i;
> >> > +	struct page *page;
> >> > +	struct kmem_provenance kp;
> >> > +
> >> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> >> > +		return;
> >> > +	page = virt_to_head_page(object);
> >> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> >> > +		pr_cont(" non-slab memory.\n");
> >> > +		return;
> >> > +	}
> >> > +	kp.kp_ptr = object;
> >> > +	kp.kp_page = page;
> >> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> >> > +	kmem_provenance(&kp);
> >> 
> >> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
> >> would make sense in this patch already).
> > 
> > Good point!
> > 
> > However, please note that the various debugging options that reserve
> > space at the beginning.  This can make the meaning of kp.kp_objp a bit
> > different than one might expect.
> 
> Yeah, I think the best would be to match the address that
> kmalloc/kmem_cache_alloc() would return, thus the beginning of the object
> itself, so you can calculate the offset within it, etc.

My thought is to do both.  Show the start address, the data offset (if
nonzero), and the pointer offset within the data.  My guess is that in
the absence of things like slub_debug=U, the pointer offset within the
data is the best way to figure out which structure is involved.

Or do you use other tricks to work this sort of thing out?

> >> > --- a/mm/util.c
> >> > +++ b/mm/util.c
> >> 
> >> I think mm/debug.c is a better fit as it already has dump_page() of a similar
> >> nature. Also you can call that from from mem_dump_obj() at least in case when
> >> the more specific handlers fail. It will even include page_owner info if enabled! :)
> > 
> > I will count this as one vote for mm/debug.c.
> > 
> > Two things to consider, though...  First, Joonsoo suggests that the fact
> > that this produces useful information without any debugging information
> > enabled makes it not be debugging as such.
> 
> Well there's already dump_page() which also produces information without special
> configs.
> We're not the best subsystem in this kind of consistency...
> 
> > Second, mm/debug.c does
> > not include either slab.h or vmalloc.h.  The second might not be a
> > showstopper, but I was interpreting this to mean that its role was
> > less central.
> 
> I think it can include whatever becomes needed there :)

I figured that there was a significant probability that I would have to
move it, and I really don't have a basis for a preference, let alone
a preference.  But I would like to avoid moving it more than once, so
I also figured I should give anyone else having an educated preference
a chance to speak up.  ;-)

							Thanx, Paul

> >> Thanks,
> >> Vlastimil
> >> 
> >> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >> >  	kunmap_atomic(addr1);
> >> >  	return ret;
> >> >  }
> >> > +
> >> > +/**
> >> > + * mem_dump_obj - Print available provenance information
> >> > + * @object: object for which to find provenance information.
> >> > + *
> >> > + * This function uses pr_cont(), so that the caller is expected to have
> >> > + * printed out whatever preamble is appropriate.  The provenance information
> >> > + * depends on the type of object and on how much debugging is enabled.
> >> > + * For example, for a slab-cache object, the slab name is printed, and,
> >> > + * if available, the return address and stack trace from the allocation
> >> > + * of that object.
> >> > + */
> >> > +void mem_dump_obj(void *object)
> >> > +{
> >> > +	if (!virt_addr_valid(object)) {
> >> > +		pr_cont(" non-paged (local) memory.\n");
> >> > +		return;
> >> > +	}
> >> > +	if (kmem_valid_obj(object)) {
> >> > +		kmem_dump_obj(object);
> >> > +		return;
> >> > +	}
> >> > +	pr_cont(" non-slab memory.\n");
> >> > +}
> >> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
> >> > 
> >> 
> > 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block
  2020-12-10 12:04     ` Joonsoo Kim
@ 2020-12-10 23:41       ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-10 23:41 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 09:04:11PM +0900, Joonsoo Kim wrote:
> On Tue, Dec 08, 2020 at 05:12:59PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> > 
> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Introducing three functions, kmem_valid_obj(), kmem_provenance(),
> mem_dump_obj() looks better than patchset v1. Nice work. Few comments
> below.

Glad you like it!

> > ---
> >  include/linux/mm.h   |  2 ++
> >  include/linux/slab.h |  2 ++
> >  mm/slab.c            | 28 +++++++++++++++++++++
> >  mm/slab.h            | 11 +++++++++
> >  mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/slob.c            |  7 ++++++
> >  mm/slub.c            | 40 ++++++++++++++++++++++++++++++
> >  mm/util.c            | 25 +++++++++++++++++++
> >  8 files changed, 184 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ef360fe..1eea266 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
> >  
> >  extern int sysctl_nr_trim_pages;
> >  
> > +void mem_dump_obj(void *object);
> > +
> >  #endif /* __KERNEL__ */
> >  #endif /* _LINUX_MM_H */
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index dd6897f..169b511 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -186,6 +186,8 @@ void kfree(const void *);
> >  void kfree_sensitive(const void *);
> >  size_t __ksize(const void *);
> >  size_t ksize(const void *);
> > +bool kmem_valid_obj(void *object);
> > +void kmem_dump_obj(void *object);
> >  
> >  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> >  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> > diff --git a/mm/slab.c b/mm/slab.c
> > index b111356..72b6743 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3602,6 +3602,34 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> >  EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
> >  #endif
> >  
> > +void kmem_provenance(struct kmem_provenance *kpp)
> 
> To open up the possibility of future enhancement, name, provenance,
> looks not good to me. This function could be used to extract various
> object information so such as kmem_obj_info() looks better to me. Any
> thought?

The name kmem_obj_info() works for me, updated.

> > +{
> > +#ifdef DEBUG
> > +	struct kmem_cache *cachep;
> > +	void *object = kpp->kp_ptr;
> > +	unsigned int objnr;
> > +	void *objp;
> > +	struct page *page = kpp->kp_page;
> > +
> > +	cachep = page->slab_cache;
> > +	if (!(cachep->flags & SLAB_STORE_USER)) {
> > +		kpp->kp_ret = NULL;
> > +		goto nodebug;
> > +	}
> > +	objp = object - obj_offset(cachep);
> > +	page = virt_to_head_page(objp);
> > +	objnr = obj_to_index(cachep, page, objp);
> > +	objp = index_to_obj(cachep, page, objnr);
> > +	kpp->kp_objp = objp;
> > +	kpp->kp_ret = *dbg_userword(cachep, objp);
> > +nodebug:
> > +#else
> > +	kpp->kp_ret = NULL;
> > +#endif
> > +	if (kpp->kp_nstack)
> > +		kpp->kp_stack[0] = NULL;
> > +}
> > +
> >  static __always_inline void *
> >  __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> >  {
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 6d7c6a5..28a41d5 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -630,4 +630,15 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
> >  	return false;
> >  }
> >  
> > +#define KS_ADDRS_COUNT 16
> > +struct kmem_provenance {
> > +	void *kp_ptr;
> > +	struct page *kp_page;
> > +	void *kp_objp;
> > +	void *kp_ret;
> > +	void *kp_stack[KS_ADDRS_COUNT];
> > +	int kp_nstack;
> > +};
> > +void kmem_provenance(struct kmem_provenance *kpp);
> > +
> >  #endif /* MM_SLAB_H */
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index f9ccd5d..09f0cbc 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -536,6 +536,75 @@ bool slab_is_available(void)
> >  	return slab_state >= UP;
> >  }
> >  
> > +/**
> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > + * @object: pointer to query.
> > + *
> > + * Return: %true if the pointer is to a not-yet-freed object from
> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > + * is to an already-freed object, and %false otherwise.
> > + */
> > +bool kmem_valid_obj(void *object)
> > +{
> > +	struct page *page;
> > +
> > +	if (!virt_addr_valid(object))
> > +		return false;
> > +	page = virt_to_head_page(object);
> > +	return PageSlab(page);
> > +}
> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> > +
> > +/**
> > + * kmem_dump_obj - Print available slab provenance information
> > + * @object: slab object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For a slab-cache object, the fact that it is a slab object is printed,
> > + * and, if available, the slab name, return address, and stack trace from
> > + * the allocation of that object.
> > + *
> > + * This function will splat if passed a pointer to a non-slab object.
> > + * If you are not sure what type of object you have, you should instead
> > + * use mem_dump_obj().
> > + */
> > +void kmem_dump_obj(void *object)
> > +{
> > +	int i;
> > +	struct page *page;
> > +	struct kmem_provenance kp;
> > +
> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > +		return;
> > +	page = virt_to_head_page(object);
> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > +		pr_cont(" non-slab memory.\n");
> > +		return;
> > +	}
> > +	kp.kp_ptr = object;
> > +	kp.kp_page = page;
> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> 
> I hope that kmem_dump_obj() doesn't set any kp fields. It's the job
> reserved for kmem_provenance().

I assigned to kp.kp_ptr to avoid doing it in each of the three variants
of kmem_provenance(), but it is clearly not a big deal to do the three
assignments.  Ditto for kp.kp_page.

I can remove the kp.kp_nstack assignment entirely and have the variants
just use KS_ADDRS_COUNT directly.

And I will zero-initialize kp, thus getting rid of some of the
NULL/0 assignments in the various kmem_provenance() functions.
And a lot of goto statements.

> > +	kmem_provenance(&kp);
> > +	if (page->slab_cache)
> > +		pr_cont(" slab %s", page->slab_cache->name);
> 
> Rather than accessing page->slab_cache, it's better to introduce
> slab_cache field on kp and use it. Note that slob doesn't use
> page->slab_cache. In slob, that field on struct page would be NULL so
> it would not cause a problem. But using kp makes things clear.

Easy enough!

> > +	else
> > +		pr_cont(" slab ");
> > +	if (kp.kp_ret)
> > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > +	else
> > +		pr_cont("\n");
> > +	if (kp.kp_stack[0]) {
> 
> This check would be useless since we check it on every iteration.

Good catch, removed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics
  2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                     ` (4 preceding siblings ...)
  2020-12-09  1:13   ` [PATCH v2 sl-b 5/5] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
@ 2020-12-11  1:19   ` Paul E. McKenney
  2020-12-11  1:19     ` [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
                       ` (5 more replies)
  5 siblings, 6 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-11  1:19 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This is v3 of the series the improves diagnostics by providing access
to additional information including the return addresses, slab names,
offsets, and sizes collected by the sl*b allocators and by vmalloc().
If the allocator is not configured to collect this information, the
diagnostics fall back to a reasonable approximation of their earlier
state.

One use case is the queue_rcu_work() function, which might be used
by any number of kernel subsystems.  If the caller does back-to-back
invocations of queue_rcu_work(), this constitutes a double-free bug,
and (if so configured) the debug-objects system will flag this, printing
the callback function.  In most cases, printing this function suffices.
However, for double-free bugs involving queue_rcu_work(), the RCU callback
function will always be rcu_work_rcufn(), which provides almost no help to
the poor person trying to find this double-free bug.  The return address
from the allocator of the memory containing the rcu_work structure can
provide an additional valuable clue.

Another use case is the percpu_ref_switch_to_atomic_rcu() function,
which detects percpu_ref reference-count underflow.  Unfortunately,
the only data that this function has access to doesn't have much in the
way of identifying characteristics.  Yes, it might be possible to gain
more information from a crash dump, but it is more convenient for the
needed hints to be in the console log.

Unfortunately, printing the return address in this case is of little help
because this object is allocated from percpu_ref_init(), regardless of
what part of the kernel is responsible for the reference-count underflow
(though perhaps the slab and offsets might help in some cases).  However,
CONFIG_STACKTRACE=y kernels (such as those enabling ftrace) using slub
with debugging enabled also collect stack traces.  This series therefore
also provides a way of extracting these stack traces to provide additional
information to those debugging percpu_ref reference-count underflows.

The patches are as follows:

1.	Add mem_dump_obj() to print source of memory block.

2.	Make mem_dump_obj() handle NULL and zero-sized pointers.

3.	Make mem_dump_obj() handle vmalloc() memory.

4.	Make mem_obj_dump() vmalloc() dumps include start and length.

5.	Make call_rcu() print mem_dump_obj() info for double-freed
	callback.

6.	percpu_ref: Dump mem_dump_obj() info upon reference-count
	underflow.

Changes since v2:

o	Apply more feedback from Joonsoo Kim on naming and code structure.

o	Based on discussions with Vlastimil Babka, added code to print
	offsets and sizes where available.  This can help identify which
	structure is involved.

Changes since v1:

o	Apply feedback from Joonsoo Kim, mostly around naming and
	code structure.

o	Apply fix suggested by Stephen Rothwell for a bug that was
	also located by kbuild test robot.

o	Add support for vmalloc().

o	Add support for special pointers.

o	Additional rework simplifying use of mem_dump_obj(), which
	simplifies both the RCU and the percpu_ref uses.

						Thanx, Paul

------------------------------------------------------------------------

 include/linux/mm.h      |    2 +
 include/linux/slab.h    |    2 +
 include/linux/vmalloc.h |    6 +++
 kernel/rcu/tree.c       |    7 +++-
 lib/percpu-refcount.c   |   12 +++++--
 mm/slab.c               |   20 ++++++++++++
 mm/slab.h               |   12 +++++++
 mm/slab_common.c        |   74 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c               |    6 +++
 mm/slub.c               |   36 +++++++++++++++++++++++
 mm/util.c               |   45 ++++++++++++++++++++++++-----
 mm/vmalloc.c            |   15 +++++++++
 12 files changed, 224 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
@ 2020-12-11  1:19     ` paulmck
  2020-12-11  2:22       ` Joonsoo Kim
  2020-12-11  1:19     ` [PATCH v3 sl-b 2/6] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: paulmck @ 2020-12-11  1:19 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
[ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
[ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/mm.h   |  2 ++
 include/linux/slab.h |  2 ++
 mm/slab.c            | 20 ++++++++++++++
 mm/slab.h            | 12 +++++++++
 mm/slab_common.c     | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c            |  6 +++++
 mm/slub.c            | 36 +++++++++++++++++++++++++
 mm/util.c            | 24 +++++++++++++++++
 8 files changed, 176 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe..1eea266 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+void mem_dump_obj(void *object);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index dd6897f..169b511 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -186,6 +186,8 @@ void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
+bool kmem_valid_obj(void *object);
+void kmem_dump_obj(void *object);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index b111356..66f00ad 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3633,6 +3633,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 #endif /* CONFIG_NUMA */
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+	struct kmem_cache *cachep;
+	unsigned int objnr;
+	void *objp;
+
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+	cachep = page->slab_cache;
+	kpp->kp_slab_cache = cachep;
+	objp = object - obj_offset(cachep);
+	kpp->kp_data_offset = obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	kpp->kp_objp = objp;
+	if (DEBUG && cachep->flags & SLAB_STORE_USER)
+		kpp->kp_ret = *dbg_userword(cachep, objp);
+}
+
 /**
  * __do_kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab.h b/mm/slab.h
index 6d7c6a5..0dc705b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -630,4 +630,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
 	return false;
 }
 
+#define KS_ADDRS_COUNT 16
+struct kmem_obj_info {
+	void *kp_ptr;
+	struct page *kp_page;
+	void *kp_objp;
+	unsigned long kp_data_offset;
+	struct kmem_cache *kp_slab_cache;
+	void *kp_ret;
+	void *kp_stack[KS_ADDRS_COUNT];
+};
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f9ccd5d..df2e203 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -536,6 +536,80 @@ bool slab_is_available(void)
 	return slab_state >= UP;
 }
 
+/**
+ * kmem_valid_obj - does the pointer reference a valid slab object?
+ * @object: pointer to query.
+ *
+ * Return: %true if the pointer is to a not-yet-freed object from
+ * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
+ * is to an already-freed object, and %false otherwise.
+ */
+bool kmem_valid_obj(void *object)
+{
+	struct page *page;
+
+	if (!virt_addr_valid(object))
+		return false;
+	page = virt_to_head_page(object);
+	return PageSlab(page);
+}
+
+/**
+ * kmem_dump_obj - Print available slab provenance information
+ * @object: slab object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For a slab-cache object, the fact that it is a slab object is printed,
+ * and, if available, the slab name, return address, and stack trace from
+ * the allocation of that object.
+ *
+ * This function will splat if passed a pointer to a non-slab object.
+ * If you are not sure what type of object you have, you should instead
+ * use mem_dump_obj().
+ */
+void kmem_dump_obj(void *object)
+{
+	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
+	int i;
+	struct page *page;
+	unsigned long ptroffset;
+	struct kmem_obj_info kp = { };
+
+	if (WARN_ON_ONCE(!virt_addr_valid(object)))
+		return;
+	page = virt_to_head_page(object);
+	if (WARN_ON_ONCE(!PageSlab(page))) {
+		pr_cont(" non-slab memory.\n");
+		return;
+	}
+	kmem_obj_info(&kp, object, page);
+	if (kp.kp_slab_cache)
+		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
+	else
+		pr_cont(" slab%s", cp);
+	if (kp.kp_objp)
+		pr_cont(" start %px", kp.kp_objp);
+	if (kp.kp_data_offset)
+		pr_cont(" data offset %lu", kp.kp_data_offset);
+	if (kp.kp_objp) {
+		ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
+		pr_cont(" pointer offset %lu", ptroffset);
+	}
+	if (kp.kp_slab_cache && kp.kp_slab_cache->usersize)
+		pr_cont(" size %u", kp.kp_slab_cache->usersize);
+	if (kp.kp_ret)
+		pr_cont(" allocated at %pS\n", kp.kp_ret);
+	else
+		pr_cont("\n");
+	for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
+		if (!kp.kp_stack[i])
+			break;
+		pr_info("    %pS\n", kp.kp_stack[i]);
+	}
+}
+
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name,
diff --git a/mm/slob.c b/mm/slob.c
index 7cc9805..2ed1de2 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,6 +461,12 @@ static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+}
+
 /*
  * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
  */
diff --git a/mm/slub.c b/mm/slub.c
index b30be23..0459d2a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3918,6 +3918,42 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+#ifdef CONFIG_SLUB_DEBUG
+	void *base;
+	int i;
+	unsigned int objnr;
+	void *objp;
+	void *objp0;
+	struct kmem_cache *s = page->slab_cache;
+	struct track *trackp;
+
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+	kpp->kp_slab_cache = s;
+	base = page_address(page);
+	objp0 = kasan_reset_tag(object);
+	objp = restore_red_left(s, objp0);
+	objnr = obj_to_index(s, page, objp);
+	kpp->kp_data_offset = (unsigned long)((char *)objp0 - (char *)objp);
+	objp = base + s->size * objnr;
+	kpp->kp_objp = objp;
+	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
+	    !(s->flags & SLAB_STORE_USER))
+		return;
+	trackp = get_track(s, objp, TRACK_ALLOC);
+	kpp->kp_ret = (void *)trackp->addr;
+#ifdef CONFIG_STACKTRACE
+	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
+		kpp->kp_stack[i] = (void *)trackp->addrs[i];
+		if (!kpp->kp_stack[i])
+			break;
+	}
+#endif
+#endif
+}
+
 /********************************************************************
  *		Kmalloc subsystem
  *******************************************************************/
diff --git a/mm/util.c b/mm/util.c
index 4ddb6e1..f2e0c4d9 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -970,3 +970,27 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
 	kunmap_atomic(addr1);
 	return ret;
 }
+
+/**
+ * mem_dump_obj - Print available provenance information
+ * @object: object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For example, for a slab-cache object, the slab name is printed, and,
+ * if available, the return address and stack trace from the allocation
+ * of that object.
+ */
+void mem_dump_obj(void *object)
+{
+	if (!virt_addr_valid(object)) {
+		pr_cont(" non-paged (local) memory.\n");
+		return;
+	}
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	pr_cont(" non-slab memory.\n");
+}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 2/6] mm: Make mem_dump_obj() handle NULL and zero-sized pointers
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  2020-12-11  1:19     ` [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
@ 2020-12-11  1:19     ` paulmck
  2020-12-11  1:20     ` [PATCH v3 sl-b 3/6] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-11  1:19 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit makes mem_dump_obj() call out NULL and zero-sized pointers
specially instead of classifying them as non-paged memory.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 mm/util.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index f2e0c4d9..f7c94c8 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -985,7 +985,12 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
 void mem_dump_obj(void *object)
 {
 	if (!virt_addr_valid(object)) {
-		pr_cont(" non-paged (local) memory.\n");
+		if (object == NULL)
+			pr_cont(" NULL pointer.\n");
+		else if (object == ZERO_SIZE_PTR)
+			pr_cont(" zero-size pointer.\n");
+		else
+			pr_cont(" non-paged (local) memory.\n");
 		return;
 	}
 	if (kmem_valid_obj(object)) {
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 3/6] mm: Make mem_dump_obj() handle vmalloc() memory
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
  2020-12-11  1:19     ` [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
  2020-12-11  1:19     ` [PATCH v3 sl-b 2/6] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
@ 2020-12-11  1:20     ` paulmck
  2020-12-11  1:20     ` [PATCH v3 sl-b 4/6] mm: Make mem_obj_dump() vmalloc() dumps include start and length paulmck
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-11  1:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit adds vmalloc() support to mem_dump_obj().  Note that the
vmalloc_dump_obj() function combines the checking and dumping, in
contrast with the split between kmem_valid_obj() and kmem_dump_obj().
The reason for the difference is that the checking in the vmalloc()
case involves acquiring a global lock, and redundant acquisitions of
global locks should be avoided, even on not-so-fast paths.

Note that this change causes on-stack variables to be reported as
vmalloc() storage from kernel_clone() or similar, depending on the degree
of inlining that your compiler does.  This is likely more helpful than
the earlier "non-paged (local) memory".

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/vmalloc.h |  6 ++++++
 mm/util.c               | 14 ++++++++------
 mm/vmalloc.c            | 12 ++++++++++++
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 938eaf9..c89c2be 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 int register_vmap_purge_notifier(struct notifier_block *nb);
 int unregister_vmap_purge_notifier(struct notifier_block *nb);
 
+#ifdef CONFIG_MMU
+bool vmalloc_dump_obj(void *object);
+#else
+static inline bool vmalloc_dump_obj(void *object) { return false; }
+#endif
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/util.c b/mm/util.c
index f7c94c8..dcde696 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -984,18 +984,20 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
  */
 void mem_dump_obj(void *object)
 {
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	if (vmalloc_dump_obj(object))
+		return;
 	if (!virt_addr_valid(object)) {
 		if (object == NULL)
 			pr_cont(" NULL pointer.\n");
 		else if (object == ZERO_SIZE_PTR)
 			pr_cont(" zero-size pointer.\n");
 		else
-			pr_cont(" non-paged (local) memory.\n");
-		return;
-	}
-	if (kmem_valid_obj(object)) {
-		kmem_dump_obj(object);
+			pr_cont(" non-paged memory.\n");
 		return;
 	}
-	pr_cont(" non-slab memory.\n");
+	pr_cont(" non-slab/vmalloc memory.\n");
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6ae491a..7421719 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 }
 #endif	/* CONFIG_SMP */
 
+bool vmalloc_dump_obj(void *object)
+{
+	struct vm_struct *vm;
+	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+
+	vm = find_vm_area(objp);
+	if (!vm)
+		return false;
+	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
+	return true;
+}
+
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
 	__acquires(&vmap_purge_lock)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 4/6] mm: Make mem_obj_dump() vmalloc() dumps include start and length
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                       ` (2 preceding siblings ...)
  2020-12-11  1:20     ` [PATCH v3 sl-b 3/6] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
@ 2020-12-11  1:20     ` paulmck
  2020-12-11  1:20     ` [PATCH v3 sl-b 5/6] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
  2020-12-11  1:20     ` [PATCH v3 sl-b 6/6] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-11  1:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit adds the starting address and number of pages to the vmalloc()
information dumped by way of vmalloc_dump_obj().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 mm/vmalloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7421719..77b1100 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3439,7 +3439,8 @@ bool vmalloc_dump_obj(void *object)
 	vm = find_vm_area(objp);
 	if (!vm)
 		return false;
-	pr_cont(" vmalloc allocated at %pS\n", vm->caller);
+	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
+		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
 	return true;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 5/6] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                       ` (3 preceding siblings ...)
  2020-12-11  1:20     ` [PATCH v3 sl-b 4/6] mm: Make mem_obj_dump() vmalloc() dumps include start and length paulmck
@ 2020-12-11  1:20     ` paulmck
  2020-12-11  1:20     ` [PATCH v3 sl-b 6/6] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-11  1:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	David Rientjes, linux-mm

From: "Paul E. McKenney" <paulmck@kernel.org>

The debug-object double-free checks in __call_rcu() print out the
RCU callback function, which is usually sufficient to track down the
double free.  However, all uses of things like queue_rcu_work() will
have the same RCU callback function (rcu_work_rcufn() in this case),
so a diagnostic message for a double queue_rcu_work() needs more than
just the callback function.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the double-freed callback.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b408dca..80ceee5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2959,6 +2959,7 @@ static void check_cb_ovld(struct rcu_data *rdp)
 static void
 __call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
+	static atomic_t doublefrees;
 	unsigned long flags;
 	struct rcu_data *rdp;
 	bool was_alldone;
@@ -2972,8 +2973,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		 * Use rcu:rcu_callback trace event to find the previous
 		 * time callback was passed to __call_rcu().
 		 */
-		WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!!\n",
-			  head, head->func);
+		if (atomic_inc_return(&doublefrees) < 4) {
+			pr_err("%s(): Double-freed CB %p->%pS()!!!  ", __func__, head, head->func);
+			mem_dump_obj(head);
+		}
 		WRITE_ONCE(head->func, rcu_leak_callback);
 		return;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 sl-b 6/6] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
  2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
                       ` (4 preceding siblings ...)
  2020-12-11  1:20     ` [PATCH v3 sl-b 5/6] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
@ 2020-12-11  1:20     ` paulmck
  5 siblings, 0 replies; 49+ messages in thread
From: paulmck @ 2020-12-11  1:20 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, iamjoonsoo.kim, andrii,
	Paul E. McKenney, Ming Lei, Jens Axboe

From: "Paul E. McKenney" <paulmck@kernel.org>

Reference-count underflow for percpu_ref is detected in the RCU callback
percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not
print anything allowing easy identification of which percpu_ref use
case is underflowing.  This is of course not normally a problem when
developing a new percpu_ref use case because it is most likely that
the problem resides in this new use case.  However, when deploying a
new kernel to a large set of servers, the underflow might well be a new
corner case in any of the old percpu_ref use cases.

This commit therefore calls mem_dump_obj() to dump out any additional
available information on the underflowing percpu_ref instance.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 lib/percpu-refcount.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index e59eda0..a1071cd 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <linux/percpu-refcount.h>
 
 /*
@@ -168,6 +169,7 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 			struct percpu_ref_data, rcu);
 	struct percpu_ref *ref = data->ref;
 	unsigned long __percpu *percpu_count = percpu_count_ptr(ref);
+	static atomic_t underflows;
 	unsigned long count = 0;
 	int cpu;
 
@@ -191,9 +193,13 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
 	 */
 	atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count);
 
-	WARN_ONCE(atomic_long_read(&data->count) <= 0,
-		  "percpu ref (%ps) <= 0 (%ld) after switching to atomic",
-		  data->release, atomic_long_read(&data->count));
+	if (WARN_ONCE(atomic_long_read(&data->count) <= 0,
+		      "percpu ref (%ps) <= 0 (%ld) after switching to atomic",
+		      data->release, atomic_long_read(&data->count)) &&
+	    atomic_inc_return(&underflows) < 4) {
+		pr_err("%s(): percpu_ref underflow", __func__);
+		mem_dump_obj(data);
+	}
 
 	/* @ref is viewed as dead on all CPUs, send out switch confirmation */
 	percpu_ref_call_confirm_rcu(rcu);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  1:19     ` [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
@ 2020-12-11  2:22       ` Joonsoo Kim
  2020-12-11  3:33         ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-11  2:22 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.
> 
> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  include/linux/mm.h   |  2 ++
>  include/linux/slab.h |  2 ++
>  mm/slab.c            | 20 ++++++++++++++
>  mm/slab.h            | 12 +++++++++
>  mm/slab_common.c     | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/slob.c            |  6 +++++
>  mm/slub.c            | 36 +++++++++++++++++++++++++
>  mm/util.c            | 24 +++++++++++++++++
>  8 files changed, 176 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe..1eea266 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  
>  extern int sysctl_nr_trim_pages;
>  
> +void mem_dump_obj(void *object);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index dd6897f..169b511 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -186,6 +186,8 @@ void kfree(const void *);
>  void kfree_sensitive(const void *);
>  size_t __ksize(const void *);
>  size_t ksize(const void *);
> +bool kmem_valid_obj(void *object);
> +void kmem_dump_obj(void *object);
>  
>  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
>  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> diff --git a/mm/slab.c b/mm/slab.c
> index b111356..66f00ad 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3633,6 +3633,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
>  EXPORT_SYMBOL(__kmalloc_node_track_caller);
>  #endif /* CONFIG_NUMA */
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	struct kmem_cache *cachep;
> +	unsigned int objnr;
> +	void *objp;
> +
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +	cachep = page->slab_cache;
> +	kpp->kp_slab_cache = cachep;
> +	objp = object - obj_offset(cachep);
> +	kpp->kp_data_offset = obj_offset(cachep);
> +	page = virt_to_head_page(objp);
> +	objnr = obj_to_index(cachep, page, objp);
> +	objp = index_to_obj(cachep, page, objnr);
> +	kpp->kp_objp = objp;
> +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> +		kpp->kp_ret = *dbg_userword(cachep, objp);
> +}
> +
>  /**
>   * __do_kmalloc - allocate memory
>   * @size: how many bytes of memory are required.
> diff --git a/mm/slab.h b/mm/slab.h
> index 6d7c6a5..0dc705b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -630,4 +630,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  	return false;
>  }
>  
> +#define KS_ADDRS_COUNT 16
> +struct kmem_obj_info {
> +	void *kp_ptr;
> +	struct page *kp_page;
> +	void *kp_objp;
> +	unsigned long kp_data_offset;
> +	struct kmem_cache *kp_slab_cache;
> +	void *kp_ret;
> +	void *kp_stack[KS_ADDRS_COUNT];
> +};
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
> +
>  #endif /* MM_SLAB_H */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index f9ccd5d..df2e203 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -536,6 +536,80 @@ bool slab_is_available(void)
>  	return slab_state >= UP;
>  }
>  
> +/**
> + * kmem_valid_obj - does the pointer reference a valid slab object?
> + * @object: pointer to query.
> + *
> + * Return: %true if the pointer is to a not-yet-freed object from
> + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> + * is to an already-freed object, and %false otherwise.
> + */
> +bool kmem_valid_obj(void *object)
> +{
> +	struct page *page;
> +
> +	if (!virt_addr_valid(object))
> +		return false;
> +	page = virt_to_head_page(object);
> +	return PageSlab(page);
> +}
> +
> +/**
> + * kmem_dump_obj - Print available slab provenance information
> + * @object: slab object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For a slab-cache object, the fact that it is a slab object is printed,
> + * and, if available, the slab name, return address, and stack trace from
> + * the allocation of that object.
> + *
> + * This function will splat if passed a pointer to a non-slab object.
> + * If you are not sure what type of object you have, you should instead
> + * use mem_dump_obj().
> + */
> +void kmem_dump_obj(void *object)
> +{
> +	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
> +	int i;
> +	struct page *page;
> +	unsigned long ptroffset;
> +	struct kmem_obj_info kp = { };
> +
> +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> +		return;
> +	page = virt_to_head_page(object);
> +	if (WARN_ON_ONCE(!PageSlab(page))) {
> +		pr_cont(" non-slab memory.\n");
> +		return;
> +	}
> +	kmem_obj_info(&kp, object, page);
> +	if (kp.kp_slab_cache)
> +		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
> +	else
> +		pr_cont(" slab%s", cp);
> +	if (kp.kp_objp)
> +		pr_cont(" start %px", kp.kp_objp);
> +	if (kp.kp_data_offset)
> +		pr_cont(" data offset %lu", kp.kp_data_offset);

I don't check the code deeply but kp_data_offset could be 0 in normal
situation. Is it intentional not to print a message in this case?

> +	if (kp.kp_objp) {
> +		ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
> +		pr_cont(" pointer offset %lu", ptroffset);
> +	}
> +	if (kp.kp_slab_cache && kp.kp_slab_cache->usersize)
> +		pr_cont(" size %u", kp.kp_slab_cache->usersize);
> +	if (kp.kp_ret)
> +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> +	else
> +		pr_cont("\n");
> +	for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> +		if (!kp.kp_stack[i])
> +			break;
> +		pr_info("    %pS\n", kp.kp_stack[i]);
> +	}
> +}
> +
>  #ifndef CONFIG_SLOB
>  /* Create a cache during boot when no slab services are available yet */
>  void __init create_boot_cache(struct kmem_cache *s, const char *name,
> diff --git a/mm/slob.c b/mm/slob.c
> index 7cc9805..2ed1de2 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -461,6 +461,12 @@ static void slob_free(void *block, int size)
>  	spin_unlock_irqrestore(&slob_lock, flags);
>  }
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +}
> +
>  /*
>   * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
>   */
> diff --git a/mm/slub.c b/mm/slub.c
> index b30be23..0459d2a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3918,6 +3918,42 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	return 0;
>  }
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +#ifdef CONFIG_SLUB_DEBUG

We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
Please move them out.

Thanks.
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  2:22       ` Joonsoo Kim
@ 2020-12-11  3:33         ` Paul E. McKenney
  2020-12-11  3:42           ` Paul E. McKenney
  2020-12-11  6:54           ` Joonsoo Kim
  0 siblings, 2 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-11  3:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Fri, Dec 11, 2020 at 11:22:10AM +0900, Joonsoo Kim wrote:
> On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> > 
> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> > [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  include/linux/mm.h   |  2 ++
> >  include/linux/slab.h |  2 ++
> >  mm/slab.c            | 20 ++++++++++++++
> >  mm/slab.h            | 12 +++++++++
> >  mm/slab_common.c     | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/slob.c            |  6 +++++
> >  mm/slub.c            | 36 +++++++++++++++++++++++++
> >  mm/util.c            | 24 +++++++++++++++++
> >  8 files changed, 176 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ef360fe..1eea266 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
> >  
> >  extern int sysctl_nr_trim_pages;
> >  
> > +void mem_dump_obj(void *object);
> > +
> >  #endif /* __KERNEL__ */
> >  #endif /* _LINUX_MM_H */
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index dd6897f..169b511 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -186,6 +186,8 @@ void kfree(const void *);
> >  void kfree_sensitive(const void *);
> >  size_t __ksize(const void *);
> >  size_t ksize(const void *);
> > +bool kmem_valid_obj(void *object);
> > +void kmem_dump_obj(void *object);
> >  
> >  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> >  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> > diff --git a/mm/slab.c b/mm/slab.c
> > index b111356..66f00ad 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3633,6 +3633,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> >  EXPORT_SYMBOL(__kmalloc_node_track_caller);
> >  #endif /* CONFIG_NUMA */
> >  
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > +{
> > +	struct kmem_cache *cachep;
> > +	unsigned int objnr;
> > +	void *objp;
> > +
> > +	kpp->kp_ptr = object;
> > +	kpp->kp_page = page;
> > +	cachep = page->slab_cache;
> > +	kpp->kp_slab_cache = cachep;
> > +	objp = object - obj_offset(cachep);
> > +	kpp->kp_data_offset = obj_offset(cachep);
> > +	page = virt_to_head_page(objp);
> > +	objnr = obj_to_index(cachep, page, objp);
> > +	objp = index_to_obj(cachep, page, objnr);
> > +	kpp->kp_objp = objp;
> > +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> > +		kpp->kp_ret = *dbg_userword(cachep, objp);
> > +}
> > +
> >  /**
> >   * __do_kmalloc - allocate memory
> >   * @size: how many bytes of memory are required.
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 6d7c6a5..0dc705b 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -630,4 +630,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
> >  	return false;
> >  }
> >  
> > +#define KS_ADDRS_COUNT 16
> > +struct kmem_obj_info {
> > +	void *kp_ptr;
> > +	struct page *kp_page;
> > +	void *kp_objp;
> > +	unsigned long kp_data_offset;
> > +	struct kmem_cache *kp_slab_cache;
> > +	void *kp_ret;
> > +	void *kp_stack[KS_ADDRS_COUNT];
> > +};
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
> > +
> >  #endif /* MM_SLAB_H */
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index f9ccd5d..df2e203 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -536,6 +536,80 @@ bool slab_is_available(void)
> >  	return slab_state >= UP;
> >  }
> >  
> > +/**
> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > + * @object: pointer to query.
> > + *
> > + * Return: %true if the pointer is to a not-yet-freed object from
> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > + * is to an already-freed object, and %false otherwise.
> > + */
> > +bool kmem_valid_obj(void *object)
> > +{
> > +	struct page *page;
> > +
> > +	if (!virt_addr_valid(object))
> > +		return false;
> > +	page = virt_to_head_page(object);
> > +	return PageSlab(page);
> > +}
> > +
> > +/**
> > + * kmem_dump_obj - Print available slab provenance information
> > + * @object: slab object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For a slab-cache object, the fact that it is a slab object is printed,
> > + * and, if available, the slab name, return address, and stack trace from
> > + * the allocation of that object.
> > + *
> > + * This function will splat if passed a pointer to a non-slab object.
> > + * If you are not sure what type of object you have, you should instead
> > + * use mem_dump_obj().
> > + */
> > +void kmem_dump_obj(void *object)
> > +{
> > +	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
> > +	int i;
> > +	struct page *page;
> > +	unsigned long ptroffset;
> > +	struct kmem_obj_info kp = { };
> > +
> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > +		return;
> > +	page = virt_to_head_page(object);
> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > +		pr_cont(" non-slab memory.\n");
> > +		return;
> > +	}
> > +	kmem_obj_info(&kp, object, page);
> > +	if (kp.kp_slab_cache)
> > +		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
> > +	else
> > +		pr_cont(" slab%s", cp);
> > +	if (kp.kp_objp)
> > +		pr_cont(" start %px", kp.kp_objp);
> > +	if (kp.kp_data_offset)
> > +		pr_cont(" data offset %lu", kp.kp_data_offset);
> 
> I don't check the code deeply but kp_data_offset could be 0 in normal
> situation. Is it intentional not to print a message in this case?

Yes, so that it tells you of the offset only if it is non-zero, which as
you say happens only if certain debugging options are enabled.  Easy to
print it unconditionally if that is preferred!

> > +	if (kp.kp_objp) {
> > +		ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
> > +		pr_cont(" pointer offset %lu", ptroffset);
> > +	}
> > +	if (kp.kp_slab_cache && kp.kp_slab_cache->usersize)
> > +		pr_cont(" size %u", kp.kp_slab_cache->usersize);
> > +	if (kp.kp_ret)
> > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > +	else
> > +		pr_cont("\n");
> > +	for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> > +		if (!kp.kp_stack[i])
> > +			break;
> > +		pr_info("    %pS\n", kp.kp_stack[i]);
> > +	}
> > +}
> > +
> >  #ifndef CONFIG_SLOB
> >  /* Create a cache during boot when no slab services are available yet */
> >  void __init create_boot_cache(struct kmem_cache *s, const char *name,
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 7cc9805..2ed1de2 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -461,6 +461,12 @@ static void slob_free(void *block, int size)
> >  	spin_unlock_irqrestore(&slob_lock, flags);
> >  }
> >  
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > +{
> > +	kpp->kp_ptr = object;
> > +	kpp->kp_page = page;
> > +}
> > +
> >  /*
> >   * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
> >   */
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b30be23..0459d2a 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3918,6 +3918,42 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >  	return 0;
> >  }
> >  
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > +{
> > +#ifdef CONFIG_SLUB_DEBUG
> 
> We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
> Please move them out.

I guess since I worry about CONFIG_MMU=n it only makes sense to also
worry about CONFIG_SLUB_DEBUG=n.  Fix update.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  3:33         ` Paul E. McKenney
@ 2020-12-11  3:42           ` Paul E. McKenney
  2020-12-11  6:58             ` Joonsoo Kim
  2020-12-11  6:54           ` Joonsoo Kim
  1 sibling, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-11  3:42 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 07:33:59PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 11, 2020 at 11:22:10AM +0900, Joonsoo Kim wrote:
> > On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>

[ . . . ]

> > We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
> > Please move them out.
> 
> I guess since I worry about CONFIG_MMU=n it only makes sense to also
> worry about CONFIG_SLUB_DEBUG=n.  Fix update.

Like this?  (Patch on top of the series, to be folded into the first one.)

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/mm/slub.c b/mm/slub.c
index 0459d2a..abf43f0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3920,21 +3920,24 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 
 void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
 {
-#ifdef CONFIG_SLUB_DEBUG
 	void *base;
-	int i;
+	int __maybe_unused i;
 	unsigned int objnr;
 	void *objp;
 	void *objp0;
 	struct kmem_cache *s = page->slab_cache;
-	struct track *trackp;
+	struct track __maybe_unused *trackp;
 
 	kpp->kp_ptr = object;
 	kpp->kp_page = page;
 	kpp->kp_slab_cache = s;
 	base = page_address(page);
 	objp0 = kasan_reset_tag(object);
+#ifdef CONFIG_SLUB_DEBUG
 	objp = restore_red_left(s, objp0);
+#else
+	objp = objp0;
+#endif
 	objnr = obj_to_index(s, page, objp);
 	kpp->kp_data_offset = (unsigned long)((char *)objp0 - (char *)objp);
 	objp = base + s->size * objnr;
@@ -3942,6 +3945,7 @@ void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
 	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
 	    !(s->flags & SLAB_STORE_USER))
 		return;
+#ifdef CONFIG_SLUB_DEBUG
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	kpp->kp_ret = (void *)trackp->addr;
 #ifdef CONFIG_STACKTRACE

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  3:33         ` Paul E. McKenney
  2020-12-11  3:42           ` Paul E. McKenney
@ 2020-12-11  6:54           ` Joonsoo Kim
  1 sibling, 0 replies; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-11  6:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 07:33:59PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 11, 2020 at 11:22:10AM +0900, Joonsoo Kim wrote:
> > On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > There are kernel facilities such as per-CPU reference counts that give
> > > error messages in generic handlers or callbacks, whose messages are
> > > unenlightening.  In the case of per-CPU reference-count underflow, this
> > > is not a problem when creating a new use of this facility because in that
> > > case the bug is almost certainly in the code implementing that new use.
> > > However, trouble arises when deploying across many systems, which might
> > > exercise corner cases that were not seen during development and testing.
> > > Here, it would be really nice to get some kind of hint as to which of
> > > several uses the underflow was caused by.
> > > 
> > > This commit therefore exposes a mem_dump_obj() function that takes
> > > a pointer to memory (which must still be allocated if it has been
> > > dynamically allocated) and prints available information on where that
> > > memory came from.  This pointer can reference the middle of the block as
> > > well as the beginning of the block, as needed by things like RCU callback
> > > functions and timer handlers that might not know where the beginning of
> > > the memory block is.  These functions and handlers can use mem_dump_obj()
> > > to print out better hints as to where the problem might lie.
> > > 
> > > The information printed can depend on kernel configuration.  For example,
> > > the allocation return address can be printed only for slab and slub,
> > > and even then only when the necessary debug has been enabled.  For slab,
> > > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > > to the next power of two or use the SLAB_STORE_USER when creating the
> > > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > > to enable printing of the allocation-time stack trace.
> > > 
> > > Cc: Christoph Lameter <cl@linux.com>
> > > Cc: Pekka Enberg <penberg@kernel.org>
> > > Cc: David Rientjes <rientjes@google.com>
> > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: <linux-mm@kvack.org>
> > > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > > [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> > > [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  include/linux/mm.h   |  2 ++
> > >  include/linux/slab.h |  2 ++
> > >  mm/slab.c            | 20 ++++++++++++++
> > >  mm/slab.h            | 12 +++++++++
> > >  mm/slab_common.c     | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  mm/slob.c            |  6 +++++
> > >  mm/slub.c            | 36 +++++++++++++++++++++++++
> > >  mm/util.c            | 24 +++++++++++++++++
> > >  8 files changed, 176 insertions(+)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index ef360fe..1eea266 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
> > >  
> > >  extern int sysctl_nr_trim_pages;
> > >  
> > > +void mem_dump_obj(void *object);
> > > +
> > >  #endif /* __KERNEL__ */
> > >  #endif /* _LINUX_MM_H */
> > > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > > index dd6897f..169b511 100644
> > > --- a/include/linux/slab.h
> > > +++ b/include/linux/slab.h
> > > @@ -186,6 +186,8 @@ void kfree(const void *);
> > >  void kfree_sensitive(const void *);
> > >  size_t __ksize(const void *);
> > >  size_t ksize(const void *);
> > > +bool kmem_valid_obj(void *object);
> > > +void kmem_dump_obj(void *object);
> > >  
> > >  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> > >  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index b111356..66f00ad 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -3633,6 +3633,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> > >  EXPORT_SYMBOL(__kmalloc_node_track_caller);
> > >  #endif /* CONFIG_NUMA */
> > >  
> > > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > > +{
> > > +	struct kmem_cache *cachep;
> > > +	unsigned int objnr;
> > > +	void *objp;
> > > +
> > > +	kpp->kp_ptr = object;
> > > +	kpp->kp_page = page;
> > > +	cachep = page->slab_cache;
> > > +	kpp->kp_slab_cache = cachep;
> > > +	objp = object - obj_offset(cachep);
> > > +	kpp->kp_data_offset = obj_offset(cachep);
> > > +	page = virt_to_head_page(objp);
> > > +	objnr = obj_to_index(cachep, page, objp);
> > > +	objp = index_to_obj(cachep, page, objnr);
> > > +	kpp->kp_objp = objp;
> > > +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> > > +		kpp->kp_ret = *dbg_userword(cachep, objp);
> > > +}
> > > +
> > >  /**
> > >   * __do_kmalloc - allocate memory
> > >   * @size: how many bytes of memory are required.
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 6d7c6a5..0dc705b 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -630,4 +630,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
> > >  	return false;
> > >  }
> > >  
> > > +#define KS_ADDRS_COUNT 16
> > > +struct kmem_obj_info {
> > > +	void *kp_ptr;
> > > +	struct page *kp_page;
> > > +	void *kp_objp;
> > > +	unsigned long kp_data_offset;
> > > +	struct kmem_cache *kp_slab_cache;
> > > +	void *kp_ret;
> > > +	void *kp_stack[KS_ADDRS_COUNT];
> > > +};
> > > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
> > > +
> > >  #endif /* MM_SLAB_H */
> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index f9ccd5d..df2e203 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -536,6 +536,80 @@ bool slab_is_available(void)
> > >  	return slab_state >= UP;
> > >  }
> > >  
> > > +/**
> > > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > > + * @object: pointer to query.
> > > + *
> > > + * Return: %true if the pointer is to a not-yet-freed object from
> > > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > > + * is to an already-freed object, and %false otherwise.
> > > + */
> > > +bool kmem_valid_obj(void *object)
> > > +{
> > > +	struct page *page;
> > > +
> > > +	if (!virt_addr_valid(object))
> > > +		return false;
> > > +	page = virt_to_head_page(object);
> > > +	return PageSlab(page);
> > > +}
> > > +
> > > +/**
> > > + * kmem_dump_obj - Print available slab provenance information
> > > + * @object: slab object for which to find provenance information.
> > > + *
> > > + * This function uses pr_cont(), so that the caller is expected to have
> > > + * printed out whatever preamble is appropriate.  The provenance information
> > > + * depends on the type of object and on how much debugging is enabled.
> > > + * For a slab-cache object, the fact that it is a slab object is printed,
> > > + * and, if available, the slab name, return address, and stack trace from
> > > + * the allocation of that object.
> > > + *
> > > + * This function will splat if passed a pointer to a non-slab object.
> > > + * If you are not sure what type of object you have, you should instead
> > > + * use mem_dump_obj().
> > > + */
> > > +void kmem_dump_obj(void *object)
> > > +{
> > > +	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
> > > +	int i;
> > > +	struct page *page;
> > > +	unsigned long ptroffset;
> > > +	struct kmem_obj_info kp = { };
> > > +
> > > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > > +		return;
> > > +	page = virt_to_head_page(object);
> > > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > > +		pr_cont(" non-slab memory.\n");
> > > +		return;
> > > +	}
> > > +	kmem_obj_info(&kp, object, page);
> > > +	if (kp.kp_slab_cache)
> > > +		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
> > > +	else
> > > +		pr_cont(" slab%s", cp);
> > > +	if (kp.kp_objp)
> > > +		pr_cont(" start %px", kp.kp_objp);
> > > +	if (kp.kp_data_offset)
> > > +		pr_cont(" data offset %lu", kp.kp_data_offset);
> > 
> > I don't check the code deeply but kp_data_offset could be 0 in normal
> > situation. Is it intentional not to print a message in this case?
> 
> Yes, so that it tells you of the offset only if it is non-zero, which as
> you say happens only if certain debugging options are enabled.  Easy to
> print it unconditionally if that is preferred!

Okay. I have no preference here. The question is just to understand
the code correctly for myself.

> 
> > > +	if (kp.kp_objp) {
> > > +		ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
> > > +		pr_cont(" pointer offset %lu", ptroffset);
> > > +	}
> > > +	if (kp.kp_slab_cache && kp.kp_slab_cache->usersize)
> > > +		pr_cont(" size %u", kp.kp_slab_cache->usersize);
> > > +	if (kp.kp_ret)
> > > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > > +	else
> > > +		pr_cont("\n");
> > > +	for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> > > +		if (!kp.kp_stack[i])
> > > +			break;
> > > +		pr_info("    %pS\n", kp.kp_stack[i]);
> > > +	}
> > > +}
> > > +
> > >  #ifndef CONFIG_SLOB
> > >  /* Create a cache during boot when no slab services are available yet */
> > >  void __init create_boot_cache(struct kmem_cache *s, const char *name,
> > > diff --git a/mm/slob.c b/mm/slob.c
> > > index 7cc9805..2ed1de2 100644
> > > --- a/mm/slob.c
> > > +++ b/mm/slob.c
> > > @@ -461,6 +461,12 @@ static void slob_free(void *block, int size)
> > >  	spin_unlock_irqrestore(&slob_lock, flags);
> > >  }
> > >  
> > > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > > +{
> > > +	kpp->kp_ptr = object;
> > > +	kpp->kp_page = page;
> > > +}
> > > +
> > >  /*
> > >   * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
> > >   */
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index b30be23..0459d2a 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3918,6 +3918,42 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> > >  	return 0;
> > >  }
> > >  
> > > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > > +{
> > > +#ifdef CONFIG_SLUB_DEBUG
> > 
> > We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
> > Please move them out.
> 
> I guess since I worry about CONFIG_MMU=n it only makes sense to also
> worry about CONFIG_SLUB_DEBUG=n.  Fix update.

Okay!

Thanks.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  3:42           ` Paul E. McKenney
@ 2020-12-11  6:58             ` Joonsoo Kim
  2020-12-11 16:59               ` Paul E. McKenney
  0 siblings, 1 reply; 49+ messages in thread
From: Joonsoo Kim @ 2020-12-11  6:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Thu, Dec 10, 2020 at 07:42:27PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 10, 2020 at 07:33:59PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 11, 2020 at 11:22:10AM +0900, Joonsoo Kim wrote:
> > > On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> [ . . . ]
> 
> > > We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
> > > Please move them out.
> > 
> > I guess since I worry about CONFIG_MMU=n it only makes sense to also
> > worry about CONFIG_SLUB_DEBUG=n.  Fix update.
> 
> Like this?  (Patch on top of the series, to be folded into the first one.)

Yes!

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block
  2020-12-11  6:58             ` Joonsoo Kim
@ 2020-12-11 16:59               ` Paul E. McKenney
  0 siblings, 0 replies; 49+ messages in thread
From: Paul E. McKenney @ 2020-12-11 16:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, andrii, Christoph Lameter,
	Pekka Enberg, David Rientjes, linux-mm

On Fri, Dec 11, 2020 at 03:58:51PM +0900, Joonsoo Kim wrote:
> On Thu, Dec 10, 2020 at 07:42:27PM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 10, 2020 at 07:33:59PM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 11, 2020 at 11:22:10AM +0900, Joonsoo Kim wrote:
> > > > On Thu, Dec 10, 2020 at 05:19:58PM -0800, paulmck@kernel.org wrote:
> > > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > [ . . . ]
> > 
> > > > We can get some infos even if CONFIG_SLUB_DEBUG isn't defined.
> > > > Please move them out.
> > > 
> > > I guess since I worry about CONFIG_MMU=n it only makes sense to also
> > > worry about CONFIG_SLUB_DEBUG=n.  Fix update.
> > 
> > Like this?  (Patch on top of the series, to be folded into the first one.)
> 
> Yes!
> 
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Applied, and thank you again for the review and feedback!

Suggestions on where to route these?  Left to my own devices, they
go via -rcu in the v5.12 merge window.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-12-11 18:16 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-05  0:40 [PATCH RFC sl-b] Export return addresses for better diagnostics Paul E. McKenney
2020-12-05  0:40 ` [PATCH sl-b 1/6] mm: Add kmem_last_alloc() to return last allocation for memory block paulmck
2020-12-07  9:02   ` Joonsoo Kim
2020-12-07 17:25     ` Paul E. McKenney
2020-12-08  8:57       ` Joonsoo Kim
2020-12-08 15:17         ` Paul E. McKenney
2020-12-05  0:40 ` [PATCH sl-b 2/6] mm: Add kmem_last_alloc_errstring() to provide more kmem_last_alloc() info paulmck
2020-12-05  0:40 ` [PATCH sl-b 3/6] rcu: Make call_rcu() print allocation address of double-freed callback paulmck
2020-12-05  0:40 ` [PATCH sl-b 4/6] mm: Create kmem_last_alloc_stack() to provide stack trace in slub paulmck
2020-12-05  0:40 ` [PATCH sl-b 5/6] percpu_ref: Print allocator upon reference-count underflow paulmck
2020-12-05  0:40 ` [PATCH sl-b 6/6] percpu_ref: Print stack trace " paulmck
2020-12-09  1:11 ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
2020-12-09  1:12   ` [PATCH v2 sl-b 1/5] mm: Add mem_dump_obj() to print source of memory block paulmck
2020-12-09  8:17     ` Christoph Hellwig
2020-12-09 14:57       ` Paul E. McKenney
2020-12-09 17:53         ` Christoph Hellwig
2020-12-09 17:59           ` Paul E. McKenney
2020-12-09 17:28     ` Vlastimil Babka
2020-12-09 23:04       ` Paul E. McKenney
2020-12-10 10:48         ` Vlastimil Babka
2020-12-10 19:56           ` Paul E. McKenney
2020-12-10 12:04     ` Joonsoo Kim
2020-12-10 23:41       ` Paul E. McKenney
2020-12-09  1:13   ` [PATCH v2 sl-b 2/5] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
2020-12-09 17:48     ` Vlastimil Babka
2020-12-10  3:25       ` Paul E. McKenney
2020-12-09  1:13   ` [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
2020-12-09 17:51     ` Vlastimil Babka
2020-12-09 19:39       ` Uladzislau Rezki
2020-12-09 23:23       ` Paul E. McKenney
2020-12-10 10:49         ` Vlastimil Babka
2020-12-09 19:36     ` Uladzislau Rezki
2020-12-09 19:42       ` Paul E. McKenney
2020-12-09 20:04         ` Uladzislau Rezki
2020-12-09  1:13   ` [PATCH v2 sl-b 4/5] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
2020-12-09  1:13   ` [PATCH v2 sl-b 5/5] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
2020-12-11  1:19   ` [PATCH RFC v2 sl-b] Export return addresses etc. for better diagnostics Paul E. McKenney
2020-12-11  1:19     ` [PATCH v3 sl-b 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
2020-12-11  2:22       ` Joonsoo Kim
2020-12-11  3:33         ` Paul E. McKenney
2020-12-11  3:42           ` Paul E. McKenney
2020-12-11  6:58             ` Joonsoo Kim
2020-12-11 16:59               ` Paul E. McKenney
2020-12-11  6:54           ` Joonsoo Kim
2020-12-11  1:19     ` [PATCH v3 sl-b 2/6] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
2020-12-11  1:20     ` [PATCH v3 sl-b 3/6] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
2020-12-11  1:20     ` [PATCH v3 sl-b 4/6] mm: Make mem_obj_dump() vmalloc() dumps include start and length paulmck
2020-12-11  1:20     ` [PATCH v3 sl-b 5/6] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
2020-12-11  1:20     ` [PATCH v3 sl-b 6/6] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).