All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/15] common kmalloc v3
@ 2022-07-12 13:39 Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
                   ` (16 more replies)
  0 siblings, 17 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

This is v3 of common kmalloc series.

This series generalize most of kmalloc functions and move its
implementation into mm/slab_common.c.

I believe this series give better maintainability of code for SLAB and SLUB.
Please consider applying.

This series is based on slab/for-next and also available at
https://github.com/hygoni/linux/tree/slab-common-v3r0

Any feedbacks/reviews will be appreciated.

v2->v3:
	- Now this series does not unify tracepoints between kmalloc and
	  kmem_cache_alloc(). And does not print its name.
	  Instead, it implements common alloc/free function without
	  tracepoint.
	
	- Dropped UMA version of tracepoints and removed _node suffix
	  for NUMA version. (Vlastimil Babka)

	- Split kmem_alloc event class and defined kmem_cache_alloc and
	  kmalloc events using TRACE_EVENT() macro. And avoided
	  dereferencing fields of struct kmem_cache when not using
	  tracepoints. (Vlastimil Babka) 

	- Dropped patches cleaning up kmalloc() to avoid increasing size of
	  kernel image. (Joe Perches)

	- Tried to avoid affecting inlined kmalloc calls
	  and kmem_cache_alloc varients.
	
	- Fixed uninitialized variable bug in SLAB with NUMA.

	- Fixed an issue printing useless caller in
	  kmalloc_node_track_caller() when it calls kmalloc_large_node().

	- call WARN_ON() when __ksize() returns zero. (Christoph Lameter)

	- Adjusted feedbacks from Vlastimil Babka.
	  (Coding Style, patch organization, etc.)

Thank you all for feedback :)

===== sequence of patches =====

patch 1-2 make slab_alloc_node() available for non-NUMA
configurations for generalization.

patch 3-5 remove duplicate code in common kmalloc functions.

patch 6-10 move implementation of kmalloc_large{,_node}()
to slab_common.c and make SLAB pass requests larger than
order-1 page to page allocator. (like SLUB)

patch 11-12 generalize most of kmalloc functions.

patch 13 drops kmem_alloc event class and rename
kmem_alloc_node to kmem_alloc. and remove _node suffix
for its events.

patch 14 drop kmem_alloc event class and then define
kmalloc and kmem_cache_alloc events using TRACE_EVENT() macro.
It also avoids dereferencing fields when not using tracepoints.

patch 15-16 are small improvements of __ksize(). They are
not part of generalization but depends on this series.

  mm/slab: move NUMA-related code to __do_cache_alloc()
  mm/slab: cleanup slab_alloc() and slab_alloc_node()
  mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
  mm/slab_common: cleanup kmalloc_track_caller()
  mm/sl[au]b: factor out __do_kmalloc_node()
  mm/sl[auo]b: fold kmalloc_order_trace() into kmalloc_large()
  mm/slub: move kmalloc_large_node() to slab_common.c
  mm/slab_common: kmalloc_node: pass large requests to page allocator
  mm/slab_common: cleanup kmalloc_large()
  mm/slab: kmalloc: pass requests larger than order-1 page to page
    allocator
  mm/sl[au]b: introduce common alloc/free functions without tracepoint
  mm/sl[au]b: generalize kmalloc subsystem
  mm/slab_common: unify NUMA and UMA version of tracepoints
  mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not
    using
  mm/sl[auo]b: move definition of __ksize() to mm/slab.h
  mm/sl[au]b: check if large object is valid in __ksize()

 include/linux/slab.h        | 117 +++++----------
 include/trace/events/kmem.h |  68 +++------
 mm/slab.c                   | 290 ++++++++++--------------------------
 mm/slab.h                   |  11 ++
 mm/slab_common.c            | 176 +++++++++++++++++++---
 mm/slob.c                   |  28 +---
 mm/slub.c                   | 218 +++------------------------
 7 files changed, 320 insertions(+), 588 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 14:29   ` Christoph Lameter
  2022-07-12 13:39 ` [PATCH v3 2/15] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

To implement slab_alloc_node() independent of NUMA configuration,
move NUMA fallback/alternate allocation code into __do_cache_alloc().

One functional change here is not to check availability of node
when allocating from local node.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---

v3:
	Fixed uninitialized variable bug due to missing
	NULL-initialization of variable objp.

 mm/slab.c | 68 +++++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 764cbadba69c..3d83d17ff3b3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3186,13 +3186,14 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	return obj ? obj : fallback_alloc(cachep, flags);
 }
 
+static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
+
 static __always_inline void *
 slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
 		   unsigned long caller)
 {
 	unsigned long save_flags;
 	void *ptr;
-	int slab_node = numa_mem_id();
 	struct obj_cgroup *objcg = NULL;
 	bool init = false;
 
@@ -3207,30 +3208,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
 
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
-
-	if (nodeid == NUMA_NO_NODE)
-		nodeid = slab_node;
-
-	if (unlikely(!get_node(cachep, nodeid))) {
-		/* Node not bootstrapped yet */
-		ptr = fallback_alloc(cachep, flags);
-		goto out;
-	}
-
-	if (nodeid == slab_node) {
-		/*
-		 * Use the locally cached objects if possible.
-		 * However ____cache_alloc does not allow fallback
-		 * to other nodes. It may fail while we still have
-		 * objects on other nodes available.
-		 */
-		ptr = ____cache_alloc(cachep, flags);
-		if (ptr)
-			goto out;
-	}
-	/* ___cache_alloc_node can fall back to other nodes */
-	ptr = ____cache_alloc_node(cachep, flags, nodeid);
-out:
+	ptr = __do_cache_alloc(cachep, flags, nodeid);
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 	init = slab_want_init_on_alloc(flags, cachep);
@@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
 }
 
 static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
-	void *objp;
+	void *objp = NULL;
+	int slab_node = numa_mem_id();
 
-	if (current->mempolicy || cpuset_do_slab_mem_spread()) {
-		objp = alternate_node_alloc(cache, flags);
-		if (objp)
-			goto out;
+	if (nodeid == NUMA_NO_NODE) {
+		if (current->mempolicy || cpuset_do_slab_mem_spread()) {
+			objp = alternate_node_alloc(cachep, flags);
+			if (objp)
+				goto out;
+		}
+		/*
+		 * Use the locally cached objects if possible.
+		 * However ____cache_alloc does not allow fallback
+		 * to other nodes. It may fail while we still have
+		 * objects on other nodes available.
+		 */
+		objp = ____cache_alloc(cachep, flags);
+		nodeid = slab_node;
+	} else if (nodeid == slab_node) {
+		objp = ____cache_alloc(cachep, flags);
+	} else if (!get_node(cachep, nodeid)) {
+		/* Node not bootstrapped yet */
+		objp = fallback_alloc(cachep, flags);
+		goto out;
 	}
-	objp = ____cache_alloc(cache, flags);
 
 	/*
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
 	if (!objp)
-		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
-
+		objp = ____cache_alloc_node(cachep, flags, nodeid);
 out:
 	return objp;
 }
 #else
 
 static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unused)
 {
 	return ____cache_alloc(cachep, flags);
 }
@@ -3292,7 +3285,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
-	objp = __do_cache_alloc(cachep, flags);
+	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
@@ -3531,7 +3524,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 
 	local_irq_disable();
 	for (i = 0; i < size; i++) {
-		void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);
+		void *objp = kfence_alloc(s, s->object_size, flags) ?:
+			     __do_cache_alloc(s, flags, NUMA_NO_NODE);
 
 		if (unlikely(!objp))
 			goto error;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/15] mm/slab: cleanup slab_alloc() and slab_alloc_node()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 03/15] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Make slab_alloc_node() available even when CONFIG_NUMA=n and
make slab_alloc() wrapper of slab_alloc_node().

This is necessary for further cleanup.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 50 +++++++++++++-------------------------------------
 1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3d83d17ff3b3..5bcd2b62b5a2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3186,38 +3186,6 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	return obj ? obj : fallback_alloc(cachep, flags);
 }
 
-static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
-
-static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
-		   unsigned long caller)
-{
-	unsigned long save_flags;
-	void *ptr;
-	struct obj_cgroup *objcg = NULL;
-	bool init = false;
-
-	flags &= gfp_allowed_mask;
-	cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
-	if (unlikely(!cachep))
-		return NULL;
-
-	ptr = kfence_alloc(cachep, orig_size, flags);
-	if (unlikely(ptr))
-		goto out_hooks;
-
-	cache_alloc_debugcheck_before(cachep, flags);
-	local_irq_save(save_flags);
-	ptr = __do_cache_alloc(cachep, flags, nodeid);
-	local_irq_restore(save_flags);
-	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
-	init = slab_want_init_on_alloc(flags, cachep);
-
-out_hooks:
-	slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr, init);
-	return ptr;
-}
-
 static __always_inline void *
 __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
@@ -3266,8 +3234,8 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unus
 #endif /* CONFIG_NUMA */
 
 static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
-	   size_t orig_size, unsigned long caller)
+slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+		int nodeid, size_t orig_size, unsigned long caller)
 {
 	unsigned long save_flags;
 	void *objp;
@@ -3285,7 +3253,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
-	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
+	objp = __do_cache_alloc(cachep, flags, nodeid);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
@@ -3296,6 +3264,14 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 	return objp;
 }
 
+static __always_inline void *
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+	   size_t orig_size, unsigned long caller)
+{
+	return slab_alloc_node(cachep, lru, flags, NUMA_NO_NODE, orig_size,
+			       caller);
+}
+
 /*
  * Caller needs to acquire correct kmem_cache_node's list_lock
  * @list: List of detached free slabs should be freed by caller
@@ -3584,7 +3560,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
  */
 void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
-	void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);
+	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
 	trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
 				    cachep->object_size, cachep->size,
@@ -3602,7 +3578,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 {
 	void *ret;
 
-	ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);
+	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc_node(_RET_IP_, ret, cachep,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 03/15] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 2/15] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 04/15] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n,
remove CONFIG_NUMA ifdefs for common kmalloc functions.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 28 ----------------------------
 mm/slab.c            |  2 --
 mm/slob.c            |  5 +----
 mm/slub.c            |  6 ------
 4 files changed, 1 insertion(+), 40 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..4754c834b0e3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -456,38 +456,18 @@ static __always_inline void kfree_bulk(size_t size, void **p)
 	kmem_cache_free_bulk(NULL, size, p);
 }
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
 							 __alloc_size(1);
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
 									 __malloc;
-#else
-static __always_inline __alloc_size(1) void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __kmalloc(size, flags);
-}
-
-static __always_inline void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node)
-{
-	return kmem_cache_alloc(s, flags);
-}
-#endif
 
 #ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
 				   __assume_slab_alignment __alloc_size(3);
 
-#ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
 					 int node, size_t size) __assume_slab_alignment
 								__alloc_size(4);
-#else
-static __always_inline __alloc_size(4) void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
-						 gfp_t gfpflags, int node, size_t size)
-{
-	return kmem_cache_alloc_trace(s, gfpflags, size);
-}
-#endif /* CONFIG_NUMA */
 
 #else /* CONFIG_TRACING */
 static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s,
@@ -701,20 +681,12 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
 }
 
 
-#ifdef CONFIG_NUMA
 extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
 					 unsigned long caller) __alloc_size(1);
 #define kmalloc_node_track_caller(size, flags, node) \
 	__kmalloc_node_track_caller(size, flags, node, \
 			_RET_IP_)
 
-#else /* CONFIG_NUMA */
-
-#define kmalloc_node_track_caller(size, flags, node) \
-	kmalloc_track_caller(size, flags)
-
-#endif /* CONFIG_NUMA */
-
 /*
  * Shortcuts
  */
diff --git a/mm/slab.c b/mm/slab.c
index 5ca8bb7335dc..81804921c538 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3535,7 +3535,6 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
 #endif
 
-#ifdef CONFIG_NUMA
 /**
  * kmem_cache_alloc_node - Allocate an object on the specified node
  * @cachep: The cache to allocate from.
@@ -3609,7 +3608,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
 	return __do_kmalloc_node(size, flags, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_PRINTK
 void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
diff --git a/mm/slob.c b/mm/slob.c
index 56421fe461c4..c54aad6b106c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -536,14 +536,12 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
 }
 EXPORT_SYMBOL(__kmalloc_track_caller);
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
 					int node, unsigned long caller)
 {
 	return __do_kmalloc_node(size, gfp, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif
 
 void kfree(const void *block)
 {
@@ -647,7 +645,7 @@ void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_
 	return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru);
-#ifdef CONFIG_NUMA
+
 void *__kmalloc_node(size_t size, gfp_t gfp, int node)
 {
 	return __do_kmalloc_node(size, gfp, node, _RET_IP_);
@@ -659,7 +657,6 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
 	return slob_alloc_node(cachep, gfp, node);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
-#endif
 
 static void __kmem_cache_free(void *b, int size)
 {
diff --git a/mm/slub.c b/mm/slub.c
index 26b00951aad1..a5b3a4484263 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3287,7 +3287,6 @@ void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
 #endif
 
-#ifdef CONFIG_NUMA
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
@@ -3314,7 +3313,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
-#endif	/* CONFIG_NUMA */
 
 /*
  * Slow path handling. This may still be called frequently since objects
@@ -4427,7 +4425,6 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-#ifdef CONFIG_NUMA
 static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
@@ -4474,7 +4471,6 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	return ret;
 }
 EXPORT_SYMBOL(__kmalloc_node);
-#endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
@@ -4930,7 +4926,6 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
 }
 EXPORT_SYMBOL(__kmalloc_track_caller);
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 					int node, unsigned long caller)
 {
@@ -4960,7 +4955,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 	return ret;
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif
 
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct slab *slab)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 04/15] mm/slab_common: cleanup kmalloc_track_caller()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (2 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 03/15] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 17 ++++++++---------
 mm/slab.c            |  6 ------
 mm/slob.c            |  6 ------
 mm/slub.c            | 22 ----------------------
 4 files changed, 8 insertions(+), 43 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4754c834b0e3..a0e57df3d5a4 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -651,6 +651,12 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
 	return kmalloc_array(n, size, flags | __GFP_ZERO);
 }
 
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
+				  unsigned long caller) __alloc_size(1);
+#define kmalloc_node_track_caller(size, flags, node) \
+	__kmalloc_node_track_caller(size, flags, node, \
+				    _RET_IP_)
+
 /*
  * kmalloc_track_caller is a special version of kmalloc that records the
  * calling function of the routine calling it for slab leak tracking instead
@@ -659,9 +665,9 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
  * allocator where we care about the real place the memory allocation
  * request comes from.
  */
-extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller);
 #define kmalloc_track_caller(size, flags) \
-	__kmalloc_track_caller(size, flags, _RET_IP_)
+	__kmalloc_node_track_caller(size, flags, \
+				    NUMA_NO_NODE, _RET_IP_)
 
 static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
 							  int node)
@@ -680,13 +686,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
 	return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
 }
 
-
-extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
-					 unsigned long caller) __alloc_size(1);
-#define kmalloc_node_track_caller(size, flags, node) \
-	__kmalloc_node_track_caller(size, flags, node, \
-			_RET_IP_)
-
 /*
  * Shortcuts
  */
diff --git a/mm/slab.c b/mm/slab.c
index 81804921c538..da2f6a5dd8fa 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3665,12 +3665,6 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
-{
-	return __do_kmalloc(size, flags, caller);
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 /**
  * kmem_cache_free - Deallocate an object
  * @cachep: The cache the allocation was from.
diff --git a/mm/slob.c b/mm/slob.c
index c54aad6b106c..80cdbe4f0d67 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -530,12 +530,6 @@ void *__kmalloc(size_t size, gfp_t gfp)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
-{
-	return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
 					int node, unsigned long caller)
 {
diff --git a/mm/slub.c b/mm/slub.c
index a5b3a4484263..7c284535a62b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4904,28 +4904,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 	return 0;
 }
 
-void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return kmalloc_large(size, gfpflags);
-
-	s = kmalloc_slab(size, gfpflags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc(s, NULL, gfpflags, caller, size);
-
-	/* Honor the call site pointer we received. */
-	trace_kmalloc(caller, ret, s, size, s->size, gfpflags);
-
-	return ret;
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 					int node, unsigned long caller)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (3 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 04/15] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-28 14:45   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
mostly do same job. Factor out common code into __do_kmalloc_node().

Note that this patch also fixes missing kasan_kmalloc() in SLUB's
__kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slab.c | 30 +----------------------
 mm/slub.c | 71 +++++++++++++++----------------------------------------
 2 files changed, 20 insertions(+), 81 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index da2f6a5dd8fa..ab34727d61b2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3631,37 +3631,9 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 }
 #endif
 
-/**
- * __do_kmalloc - allocate memory
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @caller: function caller for debug tracking of the caller
- *
- * Return: pointer to the allocated memory or %NULL in case of error
- */
-static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
-					  unsigned long caller)
-{
-	struct kmem_cache *cachep;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return NULL;
-	cachep = kmalloc_slab(size, flags);
-	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
-		return cachep;
-	ret = slab_alloc(cachep, NULL, flags, size, caller);
-
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc(caller, ret, cachep,
-		      size, cachep->size, flags);
-
-	return ret;
-}
-
 void *__kmalloc(size_t size, gfp_t flags)
 {
-	return __do_kmalloc(size, flags, _RET_IP_);
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc);
 
diff --git a/mm/slub.c b/mm/slub.c
index 7c284535a62b..2ccc473e0ae7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4402,29 +4402,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return kmalloc_large(size, flags);
-
-	s = kmalloc_slab(size, flags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
-
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags);
-
-	ret = kasan_kmalloc(s, ret, size, flags);
-
-	return ret;
-}
-EXPORT_SYMBOL(__kmalloc);
-
 static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
@@ -4442,7 +4419,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 	return kmalloc_large_node_hook(ptr, size, flags);
 }
 
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
 	struct kmem_cache *s;
 	void *ret;
@@ -4450,7 +4428,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = kmalloc_large_node(size, flags, node);
 
-		trace_kmalloc_node(_RET_IP_, ret, NULL,
+		trace_kmalloc_node(caller, ret, NULL,
 				   size, PAGE_SIZE << get_order(size),
 				   flags, node);
 
@@ -4462,16 +4440,28 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
+	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 
-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, flags, node);
+	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
 
 	ret = kasan_kmalloc(s, ret, size, flags);
 
 	return ret;
 }
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
 EXPORT_SYMBOL(__kmalloc_node);
 
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
  * Rejects incorrectly sized objects and objects that are to be copied
@@ -4905,32 +4895,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 }
 
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
-					int node, unsigned long caller)
+				  int node, unsigned long caller)
 {
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node(size, gfpflags, node);
-
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   gfpflags, node);
-
-		return ret;
-	}
-
-	s = kmalloc_slab(size, gfpflags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
-
-	/* Honor the call site pointer we received. */
-	trace_kmalloc_node(caller, ret, s, size, s->size, gfpflags, node);
-
-	return ret;
+	return __do_kmalloc_node(size, gfpflags, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (4 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-28 15:23   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 07/15] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

There is no caller of kmalloc_order_trace() except kmalloc_large().
Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().

Also add tracepoint in kmalloc_large() that was previously
in kmalloc_order_trace().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 22 ++--------------------
 mm/slab_common.c     | 14 +++-----------
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a0e57df3d5a4..15a4c59da59e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
 }
 #endif /* CONFIG_TRACING */
 
-extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
-									 __alloc_size(1);
-
-#ifdef CONFIG_TRACING
-extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-				__assume_page_alignment __alloc_size(1);
-#else
-static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
-								 unsigned int order)
-{
-	return kmalloc_order(size, flags, order);
-}
-#endif
-
-static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
-{
-	unsigned int order = get_order(size);
-	return kmalloc_order_trace(size, flags, order);
-}
-
+void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+					      __alloc_size(1);
 /**
  * kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6c9aac5d8f4a..1f8af2106df0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * directly to the page allocator. We use __GFP_COMP, because we will need to
  * know the allocation order to free the pages properly in kfree.
  */
-void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
+void *kmalloc_large(size_t size, gfp_t flags)
 {
 	void *ret = NULL;
 	struct page *page;
+	unsigned int order = get_order(size);
 
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags = kmalloc_fix_flags(flags);
@@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
 	ret = kasan_kmalloc_large(ret, size, flags);
 	/* As ret might get tagged, call kmemleak hook after KASAN. */
 	kmemleak_alloc(ret, size, 1, flags);
-	return ret;
-}
-EXPORT_SYMBOL(kmalloc_order);
-
-#ifdef CONFIG_TRACING
-void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-{
-	void *ret = kmalloc_order(size, flags, order);
 	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_order_trace);
-#endif
+EXPORT_SYMBOL(kmalloc_large);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
 /* Randomize a generic freelist */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 07/15] mm/slub: move kmalloc_large_node() to slab_common.c
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (5 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 13:39 ` [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

In later patch SLAB will also pass requests larger than order-1 page
to page allocator. Move kmalloc_large_node() to slab_common.c.

Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is
no other caller.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h |  4 ++++
 mm/slab_common.c     | 22 ++++++++++++++++++++++
 mm/slub.c            | 25 -------------------------
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 15a4c59da59e..082499306098 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -491,6 +491,10 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
 
 void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
 					      __alloc_size(1);
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+							     __alloc_size(1);
+
 /**
  * kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1f8af2106df0..6f855587b635 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -956,6 +956,28 @@ void *kmalloc_large(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(kmalloc_large);
 
+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+	struct page *page;
+	void *ptr = NULL;
+	unsigned int order = get_order(size);
+
+	flags |= __GFP_COMP;
+	page = alloc_pages_node(node, flags, order);
+	if (page) {
+		ptr = page_address(page);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
+				      PAGE_SIZE << order);
+	}
+
+	ptr = kasan_kmalloc_large(ptr, size, flags);
+	/* As ptr might get tagged, call kmemleak hook after KASAN. */
+	kmemleak_alloc(ptr, size, 1, flags);
+
+	return ptr;
+}
+EXPORT_SYMBOL(kmalloc_large_node);
+
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
 /* Randomize a generic freelist */
 static void freelist_randomize(struct rnd_state *state, unsigned int *list,
diff --git a/mm/slub.c b/mm/slub.c
index 2ccc473e0ae7..f22a84dd27de 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,14 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
  * Hooks for other subsystems that check memory allocations. In a typical
  * production configuration these hooks all should produce no code at all.
  */
-static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
-{
-	ptr = kasan_kmalloc_large(ptr, size, flags);
-	/* As ptr might get tagged, call kmemleak hook after KASAN. */
-	kmemleak_alloc(ptr, size, 1, flags);
-	return ptr;
-}
-
 static __always_inline void kfree_hook(void *x)
 {
 	kmemleak_free(x);
@@ -4402,23 +4394,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
-	struct page *page;
-	void *ptr = NULL;
-	unsigned int order = get_order(size);
-
-	flags |= __GFP_COMP;
-	page = alloc_pages_node(node, flags, order);
-	if (page) {
-		ptr = page_address(page);
-		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
-				      PAGE_SIZE << order);
-	}
-
-	return kmalloc_large_node_hook(ptr, size, flags);
-}
-
 static __always_inline
 void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (6 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 07/15] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-28 16:09   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Now that kmalloc_large_node() is in common code, pass large requests
to page allocator in kmalloc_node() using kmalloc_large_node().

One problem is that currently there is no tracepoint in
kmalloc_large_node(). Instead of simply putting tracepoint in it,
use kmalloc_large_node{,_notrace} depending on its caller to show
useful address for both inlined kmalloc_node() and
__kmalloc_node_track_caller() when large objects are allocated.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 v3:
	This patch is new in v3 and this avoids
	missing caller in __kmalloc_large_node_track_caller()
	when kmalloc_large_node() is called.

 include/linux/slab.h | 26 +++++++++++++++++++-------
 mm/slab.h            |  2 ++
 mm/slab_common.c     | 17 ++++++++++++++++-
 mm/slub.c            |  2 +-
 4 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 082499306098..fd2e129fc813 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
 	return __kmalloc(size, flags);
 }
 
+#ifndef CONFIG_SLOB
 static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
-#ifndef CONFIG_SLOB
-	if (__builtin_constant_p(size) &&
-		size <= KMALLOC_MAX_CACHE_SIZE) {
-		unsigned int i = kmalloc_index(size);
+	if (__builtin_constant_p(size)) {
+		unsigned int index;
 
-		if (!i)
+		if (size > KMALLOC_MAX_CACHE_SIZE)
+			return kmalloc_large_node(size, flags, node);
+
+		index = kmalloc_index(size);
+
+		if (!index)
 			return ZERO_SIZE_PTR;
 
 		return kmem_cache_alloc_node_trace(
-				kmalloc_caches[kmalloc_type(flags)][i],
+				kmalloc_caches[kmalloc_type(flags)][index],
 						flags, node, size);
 	}
-#endif
 	return __kmalloc_node(size, flags, node);
 }
+#else
+static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
+		return kmalloc_large_node(size, flags, node);
+
+	return __kmalloc_node(size, flags, node);
+}
+#endif
 
 /**
  * kmalloc_array - allocate memory for an array.
diff --git a/mm/slab.h b/mm/slab.h
index a8d5eb1c323f..7cb51ff44f0c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
 
 /* Find the kmalloc slab corresponding for a certain size */
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
+
+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
 #endif
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6f855587b635..dc872e0ef0fc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -956,7 +956,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(kmalloc_large);
 
-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+static __always_inline
+void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
 	void *ptr = NULL;
@@ -976,6 +977,20 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 
 	return ptr;
 }
+
+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
+{
+	return __kmalloc_large_node_notrace(size, flags, node);
+}
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+	void *ret = __kmalloc_large_node_notrace(size, flags, node);
+
+	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
+			   PAGE_SIZE << get_order(size), flags, node);
+	return ret;
+}
 EXPORT_SYMBOL(kmalloc_large_node);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slub.c b/mm/slub.c
index f22a84dd27de..3d02cf44adf7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4401,7 +4401,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 	void *ret;
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node(size, flags, node);
+		ret = kmalloc_large_node_notrace(size, flags, node);
 
 		trace_kmalloc_node(caller, ret, NULL,
 				   size, PAGE_SIZE << get_order(size),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (7 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-28 16:13   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
make kmalloc_large() wrapper of __kmalloc_large_node_notrace().

In the meantime, add missing flag fix code in
__kmalloc_large_node_notrace().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slab_common.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index dc872e0ef0fc..9c46e2f9589f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -932,29 +932,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * directly to the page allocator. We use __GFP_COMP, because we will need to
  * know the allocation order to free the pages properly in kfree.
  */
-void *kmalloc_large(size_t size, gfp_t flags)
-{
-	void *ret = NULL;
-	struct page *page;
-	unsigned int order = get_order(size);
-
-	if (unlikely(flags & GFP_SLAB_BUG_MASK))
-		flags = kmalloc_fix_flags(flags);
-
-	flags |= __GFP_COMP;
-	page = alloc_pages(flags, order);
-	if (likely(page)) {
-		ret = page_address(page);
-		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
-				      PAGE_SIZE << order);
-	}
-	ret = kasan_kmalloc_large(ret, size, flags);
-	/* As ret might get tagged, call kmemleak hook after KASAN. */
-	kmemleak_alloc(ret, size, 1, flags);
-	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
-	return ret;
-}
-EXPORT_SYMBOL(kmalloc_large);
 
 static __always_inline
 void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
@@ -963,6 +940,9 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 	void *ptr = NULL;
 	unsigned int order = get_order(size);
 
+	if (unlikely(flags & GFP_SLAB_BUG_MASK))
+		flags = kmalloc_fix_flags(flags);
+
 	flags |= __GFP_COMP;
 	page = alloc_pages_node(node, flags, order);
 	if (page) {
@@ -978,6 +958,16 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 	return ptr;
 }
 
+void *kmalloc_large(size_t size, gfp_t flags)
+{
+	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
+
+	trace_kmalloc(_RET_IP_, ret, NULL, size,
+		      PAGE_SIZE << get_order(size), flags);
+	return ret;
+}
+EXPORT_SYMBOL(kmalloc_large);
+
 void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 {
 	return __kmalloc_large_node_notrace(size, flags, node);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (8 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-28 16:25   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

There is not much benefit for serving large objects in kmalloc().
Let's pass large requests to page allocator like SLUB for better
maintenance of common code.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
v2->v3: 
	This patch is basically same but it now depends on
	kmalloc_large_node_notrace().

 include/linux/slab.h | 23 ++++-------------
 mm/slab.c            | 60 +++++++++++++++++++++++++++++++-------------
 mm/slab.h            |  3 +++
 mm/slab_common.c     | 25 ++++++++++++------
 mm/slub.c            | 19 --------------
 5 files changed, 68 insertions(+), 62 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index fd2e129fc813..4ee5b2fed164 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void)
 
 #ifdef CONFIG_SLAB
 /*
- * The largest kmalloc size supported by the SLAB allocators is
- * 32 megabyte (2^25) or the maximum allocatable page order if that is
- * less than 32 MB.
- *
- * WARNING: Its not easy to increase this value since the allocators have
- * to do various tricks to work around compiler limitations in order to
- * ensure proper constant folding.
+ * SLAB and SLUB directly allocates requests fitting in to an order-1 page
+ * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT - 1) : 25)
-#define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
+#define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
 #endif
 #endif
 
 #ifdef CONFIG_SLUB
-/*
- * SLUB directly allocates requests fitting in to an order-1 page
- * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
- */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
 #define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
 #ifndef KMALLOC_SHIFT_LOW
@@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
 	if (size <= 512 * 1024) return 19;
 	if (size <= 1024 * 1024) return 20;
 	if (size <=  2 * 1024 * 1024) return 21;
-	if (size <=  4 * 1024 * 1024) return 22;
-	if (size <=  8 * 1024 * 1024) return 23;
-	if (size <=  16 * 1024 * 1024) return 24;
-	if (size <=  32 * 1024 * 1024) return 25;
 
 	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
 		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
@@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
 	/* Will never be reached. Needed because the compiler may complain */
 	return -1;
 }
+static_assert(PAGE_SHIFT <= 20);
 #define kmalloc_index(s) __kmalloc_index(s, true)
 #endif /* !CONFIG_SLOB */
 
diff --git a/mm/slab.c b/mm/slab.c
index ab34727d61b2..a2f43425a0ae 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3585,11 +3585,19 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 	struct kmem_cache *cachep;
 	void *ret;
 
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return NULL;
+	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+		ret = kmalloc_large_node_notrace(size, flags, node);
+
+		trace_kmalloc_node(caller, ret, NULL, size,
+				   PAGE_SIZE << get_order(size),
+				   flags, node);
+		return ret;
+	}
+
 	cachep = kmalloc_slab(size, flags);
 	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
 		return cachep;
+
 	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 
@@ -3664,17 +3672,27 @@ EXPORT_SYMBOL(kmem_cache_free);
 
 void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 {
-	struct kmem_cache *s;
-	size_t i;
 
 	local_irq_disable();
-	for (i = 0; i < size; i++) {
+	for (int i = 0; i < size; i++) {
 		void *objp = p[i];
+		struct kmem_cache *s;
 
-		if (!orig_s) /* called via kfree_bulk */
-			s = virt_to_cache(objp);
-		else
+		if (!orig_s) {
+			struct folio *folio = virt_to_folio(objp);
+
+			/* called via kfree_bulk */
+			if (!folio_test_slab(folio)) {
+				local_irq_enable();
+				free_large_kmalloc(folio, objp);
+				local_irq_disable();
+				continue;
+			}
+			s = folio_slab(folio)->slab_cache;
+		} else {
 			s = cache_from_obj(orig_s, objp);
+		}
+
 		if (!s)
 			continue;
 
@@ -3703,20 +3721,24 @@ void kfree(const void *objp)
 {
 	struct kmem_cache *c;
 	unsigned long flags;
+	struct folio *folio;
 
 	trace_kfree(_RET_IP_, objp);
 
 	if (unlikely(ZERO_OR_NULL_PTR(objp)))
 		return;
-	local_irq_save(flags);
-	kfree_debugcheck(objp);
-	c = virt_to_cache(objp);
-	if (!c) {
-		local_irq_restore(flags);
+
+	folio = virt_to_folio(objp);
+	if (!folio_test_slab(folio)) {
+		free_large_kmalloc(folio, (void *)objp);
 		return;
 	}
-	debug_check_no_locks_freed(objp, c->object_size);
 
+	c = folio_slab(folio)->slab_cache;
+
+	local_irq_save(flags);
+	kfree_debugcheck(objp);
+	debug_check_no_locks_freed(objp, c->object_size);
 	debug_check_no_obj_freed(objp, c->object_size);
 	__cache_free(c, (void *)objp, _RET_IP_);
 	local_irq_restore(flags);
@@ -4138,15 +4160,17 @@ void __check_heap_object(const void *ptr, unsigned long n,
 size_t __ksize(const void *objp)
 {
 	struct kmem_cache *c;
-	size_t size;
+	struct folio *folio;
 
 	BUG_ON(!objp);
 	if (unlikely(objp == ZERO_SIZE_PTR))
 		return 0;
 
-	c = virt_to_cache(objp);
-	size = c ? c->object_size : 0;
+	folio = virt_to_folio(objp);
+	if (!folio_test_slab(folio))
+		return folio_size(folio);
 
-	return size;
+	c = folio_slab(folio)->slab_cache;
+	return c->object_size;
 }
 EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab.h b/mm/slab.h
index 7cb51ff44f0c..c81c92d421f1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -669,6 +669,9 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 		print_tracking(cachep, x);
 	return cachep;
 }
+
+void free_large_kmalloc(struct folio *folio, void *object);
+
 #endif /* CONFIG_SLOB */
 
 static inline size_t slab_ksize(const struct kmem_cache *s)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c46e2f9589f..0dff46fb7193 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -771,8 +771,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 
 /*
  * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
- * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is
- * kmalloc-32M.
+ * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
+ * kmalloc-2M.
  */
 const struct kmalloc_info_struct kmalloc_info[] __initconst = {
 	INIT_KMALLOC_INFO(0, 0),
@@ -796,11 +796,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
 	INIT_KMALLOC_INFO(262144, 256k),
 	INIT_KMALLOC_INFO(524288, 512k),
 	INIT_KMALLOC_INFO(1048576, 1M),
-	INIT_KMALLOC_INFO(2097152, 2M),
-	INIT_KMALLOC_INFO(4194304, 4M),
-	INIT_KMALLOC_INFO(8388608, 8M),
-	INIT_KMALLOC_INFO(16777216, 16M),
-	INIT_KMALLOC_INFO(33554432, 32M)
+	INIT_KMALLOC_INFO(2097152, 2M)
 };
 
 /*
@@ -913,6 +909,21 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 	/* Kmalloc array is now usable */
 	slab_state = UP;
 }
+
+void free_large_kmalloc(struct folio *folio, void *object)
+{
+	unsigned int order = folio_order(folio);
+
+	if (WARN_ON_ONCE(order == 0))
+		pr_warn_once("object pointer: 0x%p\n", object);
+
+	kmemleak_free(object);
+	kasan_kfree_large(object);
+
+	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
+			      -(PAGE_SIZE << order));
+	__free_pages(folio_page(folio, 0), order);
+}
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 3d02cf44adf7..6cb7ca27f3b7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,12 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
  * Hooks for other subsystems that check memory allocations. In a typical
  * production configuration these hooks all should produce no code at all.
  */
-static __always_inline void kfree_hook(void *x)
-{
-	kmemleak_free(x);
-	kasan_kfree_large(x);
-}
-
 static __always_inline bool slab_free_hook(struct kmem_cache *s,
 						void *x, bool init)
 {
@@ -3550,19 +3544,6 @@ struct detached_freelist {
 	struct kmem_cache *s;
 };
 
-static inline void free_large_kmalloc(struct folio *folio, void *object)
-{
-	unsigned int order = folio_order(folio);
-
-	if (WARN_ON_ONCE(order == 0))
-		pr_warn_once("object pointer: 0x%p\n", object);
-
-	kfree_hook(object);
-	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
-			      -(PAGE_SIZE << order));
-	__free_pages(folio_page(folio, 0), order);
-}
-
 /*
  * This function progressively scans the array with free objects (with
  * a limited look ahead) and extract objects belonging to the same
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (9 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-29  9:49   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

To unify kmalloc functions in later patch, introduce common alloc/free
functions that does not have tracepoint.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---

v3: 
   Tried to avoid affecting existing functions.

 mm/slab.c | 36 +++++++++++++++++++++++++++++-------
 mm/slab.h |  4 ++++
 mm/slub.c | 13 +++++++++++++
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a2f43425a0ae..375e35c14430 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3560,6 +3560,14 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 
+void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
+			     int nodeid, size_t orig_size,
+			     unsigned long caller)
+{
+	return slab_alloc_node(cachep, NULL, flags, nodeid,
+			       orig_size, caller);
+}
+
 #ifdef CONFIG_TRACING
 void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 				  gfp_t flags,
@@ -3645,6 +3653,26 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
+static __always_inline
+void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
+			  unsigned long caller)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	debug_check_no_locks_freed(objp, cachep->object_size);
+	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
+		debug_check_no_obj_freed(objp, cachep->object_size);
+	__cache_free(cachep, objp, caller);
+	local_irq_restore(flags);
+}
+
+void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
+		       unsigned long caller)
+{
+	__do_kmem_cache_free(cachep, objp, caller);
+}
+
 /**
  * kmem_cache_free - Deallocate an object
  * @cachep: The cache the allocation was from.
@@ -3655,18 +3683,12 @@ EXPORT_SYMBOL(__kmalloc);
  */
 void kmem_cache_free(struct kmem_cache *cachep, void *objp)
 {
-	unsigned long flags;
 	cachep = cache_from_obj(cachep, objp);
 	if (!cachep)
 		return;
 
 	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
-	local_irq_save(flags);
-	debug_check_no_locks_freed(objp, cachep->object_size);
-	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
-		debug_check_no_obj_freed(objp, cachep->object_size);
-	__cache_free(cachep, objp, _RET_IP_);
-	local_irq_restore(flags);
+	__do_kmem_cache_free(cachep, objp, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
diff --git a/mm/slab.h b/mm/slab.h
index c81c92d421f1..9193e9c1f040 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -275,6 +275,10 @@ void create_kmalloc_caches(slab_flags_t);
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 
 void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+			      int node, size_t orig_size,
+			      unsigned long caller);
+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
 #endif
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slub.c b/mm/slub.c
index 6cb7ca27f3b7..74eb78678c98 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3262,6 +3262,14 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru);
 
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+			      int node, size_t orig_size,
+			      unsigned long caller)
+{
+	return slab_alloc_node(s, NULL, gfpflags, node,
+			       caller, orig_size);
+}
+
 #ifdef CONFIG_TRACING
 void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
@@ -3526,6 +3534,11 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
 }
 #endif
 
+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
+{
+	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
+}
+
 void kmem_cache_free(struct kmem_cache *s, void *x)
 {
 	s = cache_from_obj(s, x);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (10 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-29 10:25   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Now everything in kmalloc subsystem can be generalized.
Let's do it!

Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
to slab_common.c.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slab.c        | 108 -----------------------------------------------
 mm/slab_common.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
 mm/slub.c        |  87 --------------------------------------
 3 files changed, 102 insertions(+), 195 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 375e35c14430..6407dad13d5c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3587,44 +3587,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
-static __always_inline void *
-__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
-	struct kmem_cache *cachep;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node_notrace(size, flags, node);
-
-		trace_kmalloc_node(caller, ret, NULL, size,
-				   PAGE_SIZE << get_order(size),
-				   flags, node);
-		return ret;
-	}
-
-	cachep = kmalloc_slab(size, flags);
-	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
-		return cachep;
-
-	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-
-	return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
-		int node, unsigned long caller)
-{
-	return __do_kmalloc_node(size, flags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
 #ifdef CONFIG_PRINTK
 void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 {
@@ -3647,12 +3609,6 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 }
 #endif
 
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
 static __always_inline
 void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
 			  unsigned long caller)
@@ -3730,43 +3686,6 @@ void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 
-/**
- * kfree - free previously allocated memory
- * @objp: pointer returned by kmalloc.
- *
- * If @objp is NULL, no operation is performed.
- *
- * Don't free memory not originally allocated by kmalloc()
- * or you will run into trouble.
- */
-void kfree(const void *objp)
-{
-	struct kmem_cache *c;
-	unsigned long flags;
-	struct folio *folio;
-
-	trace_kfree(_RET_IP_, objp);
-
-	if (unlikely(ZERO_OR_NULL_PTR(objp)))
-		return;
-
-	folio = virt_to_folio(objp);
-	if (!folio_test_slab(folio)) {
-		free_large_kmalloc(folio, (void *)objp);
-		return;
-	}
-
-	c = folio_slab(folio)->slab_cache;
-
-	local_irq_save(flags);
-	kfree_debugcheck(objp);
-	debug_check_no_locks_freed(objp, c->object_size);
-	debug_check_no_obj_freed(objp, c->object_size);
-	__cache_free(c, (void *)objp, _RET_IP_);
-	local_irq_restore(flags);
-}
-EXPORT_SYMBOL(kfree);
-
 /*
  * This initializes kmem_cache_node or resizes various caches for all nodes.
  */
@@ -4169,30 +4088,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
 	usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
-
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
-size_t __ksize(const void *objp)
-{
-	struct kmem_cache *c;
-	struct folio *folio;
-
-	BUG_ON(!objp);
-	if (unlikely(objp == ZERO_SIZE_PTR))
-		return 0;
-
-	folio = virt_to_folio(objp);
-	if (!folio_test_slab(folio))
-		return folio_size(folio);
-
-	c = folio_slab(folio)->slab_cache;
-	return c->object_size;
-}
-EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0dff46fb7193..1000e05c77df 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -924,6 +924,108 @@ void free_large_kmalloc(struct folio *folio, void *object)
 			      -(PAGE_SIZE << order));
 	__free_pages(folio_page(folio, 0), order);
 }
+
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
+{
+	struct kmem_cache *s;
+	void *ret;
+
+	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+		ret = kmalloc_large_node_notrace(size, flags, node);
+		trace_kmalloc_node(caller, ret, NULL,
+				   size, PAGE_SIZE << get_order(size),
+				   flags, node);
+		return ret;
+	}
+
+	s = kmalloc_slab(size, flags);
+
+	if (unlikely(ZERO_OR_NULL_PTR(s)))
+		return s;
+
+	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
+	ret = kasan_kmalloc(s, ret, size, flags);
+	trace_kmalloc_node(caller, ret, s, size,
+			   s->size, flags, node);
+	return ret;
+}
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
+				  int node, unsigned long caller)
+{
+	return __do_kmalloc_node(size, flags, node, caller);
+}
+EXPORT_SYMBOL(__kmalloc_node_track_caller);
+
+/**
+ * kfree - free previously allocated memory
+ * @objp: pointer returned by kmalloc.
+ *
+ * If @objp is NULL, no operation is performed.
+ *
+ * Don't free memory not originally allocated by kmalloc()
+ * or you will run into trouble.
+ */
+void kfree(const void *object)
+{
+	struct folio *folio;
+	struct slab *slab;
+	struct kmem_cache *s;
+
+	trace_kfree(_RET_IP_, object);
+
+	if (unlikely(ZERO_OR_NULL_PTR(object)))
+		return;
+
+	folio = virt_to_folio(object);
+	if (unlikely(!folio_test_slab(folio))) {
+		free_large_kmalloc(folio, (void *)object);
+		return;
+	}
+
+	slab = folio_slab(folio);
+	s = slab->slab_cache;
+	__kmem_cache_free(s, (void *)object, _RET_IP_);
+}
+EXPORT_SYMBOL(kfree);
+
+/**
+ * __ksize -- Uninstrumented ksize.
+ * @objp: pointer to the object
+ *
+ * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
+ * safety checks as ksize() with KASAN instrumentation enabled.
+ *
+ * Return: size of the actual memory used by @objp in bytes
+ */
+size_t __ksize(const void *object)
+{
+	struct folio *folio;
+
+	if (unlikely(object == ZERO_SIZE_PTR))
+		return 0;
+
+	folio = virt_to_folio(object);
+
+	if (unlikely(!folio_test_slab(folio)))
+		return folio_size(folio);
+
+	return slab_ksize(folio_slab(folio)->slab_cache);
+}
+EXPORT_SYMBOL(__ksize);
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 74eb78678c98..836292c32e58 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4388,49 +4388,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node_notrace(size, flags, node);
-
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   flags, node);
-
-		return ret;
-	}
-
-	s = kmalloc_slab(size, flags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
-
-	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
-
-	ret = kasan_kmalloc(s, ret, size, flags);
-
-	return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
-
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
  * Rejects incorrectly sized objects and objects that are to be copied
@@ -4481,43 +4438,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
-size_t __ksize(const void *object)
-{
-	struct folio *folio;
-
-	if (unlikely(object == ZERO_SIZE_PTR))
-		return 0;
-
-	folio = virt_to_folio(object);
-
-	if (unlikely(!folio_test_slab(folio)))
-		return folio_size(folio);
-
-	return slab_ksize(folio_slab(folio)->slab_cache);
-}
-EXPORT_SYMBOL(__ksize);
-
-void kfree(const void *x)
-{
-	struct folio *folio;
-	struct slab *slab;
-	void *object = (void *)x;
-
-	trace_kfree(_RET_IP_, x);
-
-	if (unlikely(ZERO_OR_NULL_PTR(x)))
-		return;
-
-	folio = virt_to_folio(x);
-	if (unlikely(!folio_test_slab(folio))) {
-		free_large_kmalloc(folio, object);
-		return;
-	}
-	slab = folio_slab(folio);
-	slab_free(slab->slab_cache, slab, object, NULL, &object, 1, _RET_IP_);
-}
-EXPORT_SYMBOL(kfree);
-
 #define SHRINK_PROMOTE_MAX 32
 
 /*
@@ -4863,13 +4783,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 	return 0;
 }
 
-void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
-				  int node, unsigned long caller)
-{
-	return __do_kmalloc_node(size, gfpflags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct slab *slab)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (11 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-29 10:52   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 include/trace/events/kmem.h | 60 ++-----------------------------------
 mm/slab.c                   | 18 +++++------
 mm/slab_common.c            | 18 +++++------
 mm/slob.c                   | 20 ++++++-------
 mm/slub.c                   | 12 ++++----
 5 files changed, 35 insertions(+), 93 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4cb51ace600d..e078ebcdc4b1 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -11,62 +11,6 @@
 
 DECLARE_EVENT_CLASS(kmem_alloc,
 
-	TP_PROTO(unsigned long call_site,
-		 const void *ptr,
-		 struct kmem_cache *s,
-		 size_t bytes_req,
-		 size_t bytes_alloc,
-		 gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	call_site	)
-		__field(	const void *,	ptr		)
-		__field(	size_t,		bytes_req	)
-		__field(	size_t,		bytes_alloc	)
-		__field(	unsigned long,	gfp_flags	)
-		__field(	bool,		accounted	)
-	),
-
-	TP_fast_assign(
-		__entry->call_site	= call_site;
-		__entry->ptr		= ptr;
-		__entry->bytes_req	= bytes_req;
-		__entry->bytes_alloc	= bytes_alloc;
-		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
-		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
-					  ((gfp_flags & __GFP_ACCOUNT) ||
-					  (s && s->flags & SLAB_ACCOUNT)) : false;
-	),
-
-	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s accounted=%s",
-		(void *)__entry->call_site,
-		__entry->ptr,
-		__entry->bytes_req,
-		__entry->bytes_alloc,
-		show_gfp_flags(__entry->gfp_flags),
-		__entry->accounted ? "true" : "false")
-);
-
-DEFINE_EVENT(kmem_alloc, kmalloc,
-
-	TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
-		 size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
-
-	TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
-		 size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DECLARE_EVENT_CLASS(kmem_alloc_node,
-
 	TP_PROTO(unsigned long call_site,
 		 const void *ptr,
 		 struct kmem_cache *s,
@@ -109,7 +53,7 @@ DECLARE_EVENT_CLASS(kmem_alloc_node,
 		__entry->accounted ? "true" : "false")
 );
 
-DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
+DEFINE_EVENT(kmem_alloc, kmalloc,
 
 	TP_PROTO(unsigned long call_site, const void *ptr,
 		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
@@ -118,7 +62,7 @@ DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
 	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
 );
 
-DEFINE_EVENT(kmem_alloc_node, kmem_cache_alloc_node,
+DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
 
 	TP_PROTO(unsigned long call_site, const void *ptr,
 		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
diff --git a/mm/slab.c b/mm/slab.c
index 6407dad13d5c..a361d2f9d4d9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3440,8 +3440,8 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
 {
 	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
-			       cachep->object_size, cachep->size, flags);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
+			       cachep->size, flags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3529,7 +3529,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc(_RET_IP_, ret, cachep,
-		      size, cachep->size, flags);
+		      size, cachep->size, flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
@@ -3552,9 +3552,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
 	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
-				    cachep->object_size, cachep->size,
-				    flags, nodeid);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
+			       cachep->object_size, cachep->size,
+			       flags, nodeid);
 
 	return ret;
 }
@@ -3579,9 +3579,9 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc_node(_RET_IP_, ret, cachep,
-			   size, cachep->size,
-			   flags, nodeid);
+	trace_kmalloc(_RET_IP_, ret, cachep,
+		      size, cachep->size,
+		      flags, nodeid);
 	return ret;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1000e05c77df..0e66b4911ebf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -933,9 +933,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = kmalloc_large_node_notrace(size, flags, node);
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   flags, node);
+		trace_kmalloc(_RET_IP_, ret, NULL,
+			      size, PAGE_SIZE << get_order(size),
+			      flags, node);
 		return ret;
 	}
 
@@ -946,8 +946,8 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
 	ret = kasan_kmalloc(s, ret, size, flags);
-	trace_kmalloc_node(caller, ret, s, size,
-			   s->size, flags, node);
+	trace_kmalloc(_RET_IP_, ret, s, size,
+		      s->size, flags, node);
 	return ret;
 }
 
@@ -1076,7 +1076,7 @@ void *kmalloc_large(size_t size, gfp_t flags)
 	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
 
 	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags);
+		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large);
@@ -1090,8 +1090,8 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	void *ret = __kmalloc_large_node_notrace(size, flags, node);
 
-	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
-			   PAGE_SIZE << get_order(size), flags, node);
+	trace_kmalloc(_RET_IP_, ret, NULL, size,
+		      PAGE_SIZE << get_order(size), flags, node);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large_node);
@@ -1426,8 +1426,6 @@ EXPORT_SYMBOL(ksize);
 /* Tracepoints definitions. */
 EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
-EXPORT_TRACEPOINT_SYMBOL(kmalloc_node);
-EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc_node);
 EXPORT_TRACEPOINT_SYMBOL(kfree);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
 
diff --git a/mm/slob.c b/mm/slob.c
index 80cdbe4f0d67..a4d50d957c25 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 		*m = size;
 		ret = (void *)m + minalign;
 
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, size + minalign, gfp, node);
+		trace_kmalloc(caller, ret, NULL,
+			      size, size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -516,8 +516,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 			gfp |= __GFP_COMP;
 		ret = slob_new_pages(gfp, order, node);
 
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << order, gfp, node);
+		trace_kmalloc(caller, ret, NULL,
+			      size, PAGE_SIZE << order, gfp, node);
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
@@ -608,14 +608,14 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node, 0);
-		trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
-					    SLOB_UNITS(c->size) * SLOB_UNIT,
-					    flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+				       SLOB_UNITS(c->size) * SLOB_UNIT,
+				       flags, node);
 	} else {
 		b = slob_new_pages(flags, get_order(c->size), node);
-		trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
-					    PAGE_SIZE << get_order(c->size),
-					    flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+				       PAGE_SIZE << get_order(c->size),
+				       flags, node);
 	}
 
 	if (b && c->ctor) {
diff --git a/mm/slub.c b/mm/slub.c
index 836292c32e58..f1aa51480dc4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3244,7 +3244,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
-				s->size, gfpflags);
+				s->size, gfpflags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3274,7 +3274,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
 void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
 	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags);
+	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
 }
@@ -3285,8 +3285,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc_node(_RET_IP_, ret, s,
-				    s->object_size, s->size, gfpflags, node);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s,
+			       s->object_size, s->size, gfpflags, node);
 
 	return ret;
 }
@@ -3299,8 +3299,8 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
 
-	trace_kmalloc_node(_RET_IP_, ret, s,
-			   size, s->size, gfpflags, node);
+	trace_kmalloc(_RET_IP_, ret, s,
+		      size, s->size, gfpflags, node);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (12 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-29 11:23   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h Hyeonggon Yoo
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
   - Do not pass pointer to struct kmem_cache to trace_kmalloc.
     gfp flag is enough to know if it's accounted or not.
   - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
   - Avoid dereferencing s->name in when not using kmem_cache_free event.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
 mm/slab.c                   | 16 ++++------
 mm/slab_common.c            | 17 ++++++----
 mm/slob.c                   | 16 +++-------
 mm/slub.c                   | 13 +++-----
 5 files changed, 70 insertions(+), 56 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index e078ebcdc4b1..0ded2c351062 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -9,17 +9,15 @@
 #include <linux/tracepoint.h>
 #include <trace/events/mmflags.h>
 
-DECLARE_EVENT_CLASS(kmem_alloc,
+TRACE_EVENT(kmem_cache_alloc,
 
 	TP_PROTO(unsigned long call_site,
 		 const void *ptr,
 		 struct kmem_cache *s,
-		 size_t bytes_req,
-		 size_t bytes_alloc,
 		 gfp_t gfp_flags,
 		 int node),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
+	TP_ARGS(call_site, ptr, s, gfp_flags, node),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	call_site	)
@@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
 	TP_fast_assign(
 		__entry->call_site	= call_site;
 		__entry->ptr		= ptr;
-		__entry->bytes_req	= bytes_req;
-		__entry->bytes_alloc	= bytes_alloc;
+		__entry->bytes_req	= s->object_size;
+		__entry->bytes_alloc	= s->size;
 		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
 		__entry->node		= node;
 		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
@@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
 		__entry->accounted ? "true" : "false")
 );
 
-DEFINE_EVENT(kmem_alloc, kmalloc,
+TRACE_EVENT(kmalloc,
 
-	TP_PROTO(unsigned long call_site, const void *ptr,
-		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
-		 gfp_t gfp_flags, int node),
+	TP_PROTO(unsigned long call_site,
+		 const void *ptr,
+		 size_t bytes_req,
+		 size_t bytes_alloc,
+		 gfp_t gfp_flags,
+		 int node),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
-);
+	TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
 
-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
+	TP_STRUCT__entry(
+		__field(	unsigned long,	call_site	)
+		__field(	const void *,	ptr		)
+		__field(	size_t,		bytes_req	)
+		__field(	size_t,		bytes_alloc	)
+		__field(	unsigned long,	gfp_flags	)
+		__field(	int,		node		)
+		__field(	bool,		accounted	)
+	),
 
-	TP_PROTO(unsigned long call_site, const void *ptr,
-		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
-		 gfp_t gfp_flags, int node),
+	TP_fast_assign(
+		__entry->call_site	= call_site;
+		__entry->ptr		= ptr;
+		__entry->bytes_req	= bytes_req;
+		__entry->bytes_alloc	= bytes_alloc;
+		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
+		__entry->node		= node;
+		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
+					  (gfp_flags & __GFP_ACCOUNT) : false;
+	),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
+	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
+		(void *)__entry->call_site,
+		__entry->ptr,
+		__entry->bytes_req,
+		__entry->bytes_alloc,
+		show_gfp_flags(__entry->gfp_flags),
+		__entry->node,
+		__entry->accounted ? "true" : "false")
 );
 
 TRACE_EVENT(kfree,
@@ -93,20 +115,20 @@ TRACE_EVENT(kfree,
 
 TRACE_EVENT(kmem_cache_free,
 
-	TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
+	TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
 
-	TP_ARGS(call_site, ptr, name),
+	TP_ARGS(call_site, ptr, s),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	call_site	)
 		__field(	const void *,	ptr		)
-		__string(	name,	name	)
+		__string(	name,		s->name		)
 	),
 
 	TP_fast_assign(
 		__entry->call_site	= call_site;
 		__entry->ptr		= ptr;
-		__assign_str(name, name);
+		__assign_str(name, s->name);
 	),
 
 	TP_printk("call_site=%pS ptr=%p name=%s",
diff --git a/mm/slab.c b/mm/slab.c
index f9b74831e3f4..1685e5507a59 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
 {
 	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
-			       cachep->size, flags, NUMA_NO_NODE);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
 	ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc(_RET_IP_, ret, cachep,
-		      size, cachep->size, flags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret,
+		      size, cachep->size,
+		      flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
@@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
 	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
-			       cachep->object_size, cachep->size,
-			       flags, nodeid);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
 
 	return ret;
 }
@@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc(_RET_IP_, ret, cachep,
+	trace_kmalloc(_RET_IP_, ret,
 		      size, cachep->size,
 		      flags, nodeid);
 	return ret;
@@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
 	if (!cachep)
 		return;
 
-	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
+	trace_kmem_cache_free(_RET_IP_, objp, cachep);
 	__do_kmem_cache_free(cachep, objp, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0e66b4911ebf..c01c6b8f0d34 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = kmalloc_large_node_notrace(size, flags, node);
-		trace_kmalloc(_RET_IP_, ret, NULL,
+		trace_kmalloc(_RET_IP_, ret,
 			      size, PAGE_SIZE << get_order(size),
 			      flags, node);
 		return ret;
@@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
 	ret = kasan_kmalloc(s, ret, size, flags);
-	trace_kmalloc(_RET_IP_, ret, s, size,
-		      s->size, flags, node);
+	trace_kmalloc(_RET_IP_, ret,
+		      size, s->size,
+		      flags, node);
 	return ret;
 }
 
@@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
 {
 	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
 
-	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret,
+		      size, PAGE_SIZE << get_order(size),
+		      flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large);
@@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	void *ret = __kmalloc_large_node_notrace(size, flags, node);
 
-	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags, node);
+	trace_kmalloc(_RET_IP_, ret,
+		      size, PAGE_SIZE << get_order(size),
+		      flags, node);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large_node);
diff --git a/mm/slob.c b/mm/slob.c
index a4d50d957c25..97a4d2407f96 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 		*m = size;
 		ret = (void *)m + minalign;
 
-		trace_kmalloc(caller, ret, NULL,
-			      size, size + minalign, gfp, node);
+		trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 			gfp |= __GFP_COMP;
 		ret = slob_new_pages(gfp, order, node);
 
-		trace_kmalloc(caller, ret, NULL,
-			      size, PAGE_SIZE << order, gfp, node);
+		trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
@@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node, 0);
-		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
-				       SLOB_UNITS(c->size) * SLOB_UNIT,
-				       flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
 	} else {
 		b = slob_new_pages(flags, get_order(c->size), node);
-		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
-				       PAGE_SIZE << get_order(c->size),
-				       flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
 	}
 
 	if (b && c->ctor) {
@@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
 void kmem_cache_free(struct kmem_cache *c, void *b)
 {
 	kmemleak_free_recursive(b, c->flags);
-	trace_kmem_cache_free(_RET_IP_, b, c->name);
+	trace_kmem_cache_free(_RET_IP_, b, c);
 	if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
 		struct slob_rcu *slob_rcu;
 		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
diff --git a/mm/slub.c b/mm/slub.c
index f1aa51480dc4..a49c69469c64 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 {
 	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
-				s->size, gfpflags, NUMA_NO_NODE);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
 void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
 	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
 }
@@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, s,
-			       s->object_size, s->size, gfpflags, node);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
 
 	return ret;
 }
@@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
 
-	trace_kmalloc(_RET_IP_, ret, s,
-		      size, s->size, gfpflags, node);
+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
 	s = cache_from_obj(s, x);
 	if (!s)
 		return;
-	trace_kmem_cache_free(_RET_IP_, x, s->name);
+	trace_kmem_cache_free(_RET_IP_, x, s);
 	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (13 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-29 11:47   ` Vlastimil Babka
  2022-07-12 13:39 ` [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
  2022-07-29 15:08 ` [PATCH v3 00/15] common kmalloc v3 Vlastimil Babka
  16 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

__ksize() is only called by KASAN. Remove export symbol and move
definition to mm/slab.h as we don't want to grow its callers.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h |  1 -
 mm/slab.h            |  2 ++
 mm/slab_common.c     | 11 +----------
 mm/slob.c            |  1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4ee5b2fed164..701fe538650f 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
 void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
 void kfree(const void *objp);
 void kfree_sensitive(const void *objp);
-size_t __ksize(const void *objp);
 size_t ksize(const void *objp);
 #ifdef CONFIG_PRINTK
 bool kmem_valid_obj(void *object);
diff --git a/mm/slab.h b/mm/slab.h
index 9193e9c1f040..ad634e02b3cb 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
 
 #endif /* CONFIG_SLOB */
 
+size_t __ksize(const void *objp);
+
 static inline size_t slab_ksize(const struct kmem_cache *s)
 {
 #ifndef CONFIG_SLUB
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c01c6b8f0d34..1f8db7959366 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1003,15 +1003,7 @@ void kfree(const void *object)
 }
 EXPORT_SYMBOL(kfree);
 
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
+/* Uninstrumented ksize. Only called by KASAN. */
 size_t __ksize(const void *object)
 {
 	struct folio *folio;
@@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)
 
 	return slab_ksize(folio_slab(folio)->slab_cache);
 }
-EXPORT_SYMBOL(__ksize);
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slob.c b/mm/slob.c
index 97a4d2407f96..91d6e2b19929 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -584,7 +584,6 @@ size_t __ksize(const void *block)
 	m = (unsigned int *)(block - align);
 	return SLOB_UNITS(*m) * SLOB_UNIT;
 }
-EXPORT_SYMBOL(__ksize);
 
 int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (14 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h Hyeonggon Yoo
@ 2022-07-12 13:39 ` Hyeonggon Yoo
  2022-07-12 15:13   ` Christoph Lameter
  2022-07-29 11:50   ` Vlastimil Babka
  2022-07-29 15:08 ` [PATCH v3 00/15] common kmalloc v3 Vlastimil Babka
  16 siblings, 2 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-12 13:39 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Hyeonggon Yoo,
	Joe Perches, Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

__ksize() returns size of objects allocated from slab allocator.
When invalid object is passed to __ksize(), returning zero
prevents further memory corruption and makes caller be able to
check if there is an error.

If address of large object is not beginning of folio or size of
the folio is too small, it must be invalid. Return zero in such cases.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slab_common.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1f8db7959366..0d6cbe9d7ad0 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
 
 	folio = virt_to_folio(object);
 
-	if (unlikely(!folio_test_slab(folio)))
+	if (unlikely(!folio_test_slab(folio))) {
+		if (WARN_ON(object != folio_address(folio) ||
+				folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
+			return 0;
 		return folio_size(folio);
+	}
 
 	return slab_ksize(folio_slab(folio)->slab_cache);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()
  2022-07-12 13:39 ` [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
@ 2022-07-12 14:29   ` Christoph Lameter
  2022-07-13  9:39     ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Lameter @ 2022-07-12 14:29 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:

> @@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
>  }
>
>  static __always_inline void *
> -__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
> +__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
>  {
> -	void *objp;
> +	void *objp = NULL;
> +	int slab_node = numa_mem_id();
>
> -	if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> -		objp = alternate_node_alloc(cache, flags);
> -		if (objp)
> -			goto out;
> +	if (nodeid == NUMA_NO_NODE) {
> +		if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> +			objp = alternate_node_alloc(cachep, flags);
> +			if (objp)
> +				goto out;
> +		}
> +		/*
> +		 * Use the locally cached objects if possible.
> +		 * However ____cache_alloc does not allow fallback
> +		 * to other nodes. It may fail while we still have
> +		 * objects on other nodes available.
> +		 */
> +		objp = ____cache_alloc(cachep, flags);
> +		nodeid = slab_node;
> +	} else if (nodeid == slab_node) {
> +		objp = ____cache_alloc(cachep, flags);
> +	} else if (!get_node(cachep, nodeid)) {
> +		/* Node not bootstrapped yet */
> +		objp = fallback_alloc(cachep, flags);
> +		goto out;
>  	}
> -	objp = ____cache_alloc(cache, flags);
>
>  	/*
>  	 * We may just have run out of memory on the local node.
>  	 * ____cache_alloc_node() knows how to locate memory on other nodes
>  	 */
>  	if (!objp)
> -		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
> -
> +		objp = ____cache_alloc_node(cachep, flags, nodeid);


Does this preserve the original behavior? nodeid is the parameter passed
to __do_cache_alloc(). numa_mem_id() is the nearest memory node.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-12 13:39 ` [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
@ 2022-07-12 15:13   ` Christoph Lameter
  2022-07-13  9:25     ` Hyeonggon Yoo
  2022-07-29 11:50   ` Vlastimil Babka
  1 sibling, 1 reply; 49+ messages in thread
From: Christoph Lameter @ 2022-07-12 15:13 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:

> __ksize() returns size of objects allocated from slab allocator.
> When invalid object is passed to __ksize(), returning zero
> prevents further memory corruption and makes caller be able to
> check if there is an error.
>
> If address of large object is not beginning of folio or size of
> the folio is too small, it must be invalid. Return zero in such cases.

Why return 0 if there is an error and why bother the callers with these
checks. BUG()?

> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1f8db7959366..0d6cbe9d7ad0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
>
>  	folio = virt_to_folio(object);
>
> -	if (unlikely(!folio_test_slab(folio)))
> +	if (unlikely(!folio_test_slab(folio))) {
> +		if (WARN_ON(object != folio_address(folio) ||
> +				folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))

Hmmm... This may change things a bit. Before this patch it was possible to
determine the storage size of order-0 pages using ksize(). Now this
returns 0?

I guess this is an error since the order-0 page cannot come from slab
allocations.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-12 15:13   ` Christoph Lameter
@ 2022-07-13  9:25     ` Hyeonggon Yoo
  2022-07-13 10:07       ` Christoph Lameter
  0 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-13  9:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm, Marco Elver

On Tue, Jul 12, 2022 at 05:13:44PM +0200, Christoph Lameter wrote:
> On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:
> 
> > __ksize() returns size of objects allocated from slab allocator.
> > When invalid object is passed to __ksize(), returning zero
> > prevents further memory corruption and makes caller be able to
> > check if there is an error.
> >
> > If address of large object is not beginning of folio or size of
> > the folio is too small, it must be invalid. Return zero in such cases.
> 
> Why return 0 if there is an error and why bother the callers with these
> checks. BUG()?

I thought BUG should be used when there is no other solution.

> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 1f8db7959366..0d6cbe9d7ad0 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
> >
> >  	folio = virt_to_folio(object);
> >
> > -	if (unlikely(!folio_test_slab(folio)))
> > +	if (unlikely(!folio_test_slab(folio))) {
> > +		if (WARN_ON(object != folio_address(folio) ||
> > +				folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
> 
> Hmmm... This may change things a bit. Before this patch it was possible to
> determine the storage size of order-0 pages using ksize(). Now this
> returns 0?
> 
> I guess this is an error since the order-0 page cannot come from slab
> allocations.

comment in ksize() says:
	"The caller must guarantee that objp points to a valid object
	previously allocated with either kmalloc() or kmem_cache_alloc()."

It should not be used on order-0 page that is not allocated from slab. No?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc()
  2022-07-12 14:29   ` Christoph Lameter
@ 2022-07-13  9:39     ` Hyeonggon Yoo
  0 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-13  9:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Tue, Jul 12, 2022 at 04:29:10PM +0200, Christoph Lameter wrote:
> On Tue, 12 Jul 2022, Hyeonggon Yoo wrote:
> 
> > @@ -3241,31 +3219,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
> >  }
> >
> >  static __always_inline void *
> > -__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
> > +__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> >  {
> > -	void *objp;
> > +	void *objp = NULL;
> > +	int slab_node = numa_mem_id();
> >
> > -	if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> > -		objp = alternate_node_alloc(cache, flags);
> > -		if (objp)
> > -			goto out;
> > +	if (nodeid == NUMA_NO_NODE) {
> > +		if (current->mempolicy || cpuset_do_slab_mem_spread()) {
> > +			objp = alternate_node_alloc(cachep, flags);
> > +			if (objp)
> > +				goto out;
> > +		}
> > +		/*
> > +		 * Use the locally cached objects if possible.
> > +		 * However ____cache_alloc does not allow fallback
> > +		 * to other nodes. It may fail while we still have
> > +		 * objects on other nodes available.
> > +		 */
> > +		objp = ____cache_alloc(cachep, flags);
> > +		nodeid = slab_node;
> > +	} else if (nodeid == slab_node) {
> > +		objp = ____cache_alloc(cachep, flags);
> > +	} else if (!get_node(cachep, nodeid)) {
> > +		/* Node not bootstrapped yet */
> > +		objp = fallback_alloc(cachep, flags);
> > +		goto out;
> >  	}
> > -	objp = ____cache_alloc(cache, flags);
> >
> >  	/*
> >  	 * We may just have run out of memory on the local node.
> >  	 * ____cache_alloc_node() knows how to locate memory on other nodes
> >  	 */
> >  	if (!objp)
> > -		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
> > -
> > +		objp = ____cache_alloc_node(cachep, flags, nodeid);
> 
> 
> Does this preserve the original behavior? nodeid is the parameter passed
> to __do_cache_alloc(). numa_mem_id() is the nearest memory node.

Yes it does preserve the original behavior.

nodeid equals to value of numa_mem_id() when nodeid was NUMA_NO_NODE and
____cache_alloc() failed to allocate.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-13  9:25     ` Hyeonggon Yoo
@ 2022-07-13 10:07       ` Christoph Lameter
  2022-07-13 10:33         ` Marco Elver
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Lameter @ 2022-07-13 10:07 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm, Marco Elver

On Wed, 13 Jul 2022, Hyeonggon Yoo wrote:

> > Why return 0 if there is an error and why bother the callers with these
> > checks. BUG()?
>
> I thought BUG should be used when there is no other solution.

Spurios returns of 0 that the caller has to check for is a solution?

> > I guess this is an error since the order-0 page cannot come from slab
> > allocations.
>
> comment in ksize() says:
> 	"The caller must guarantee that objp points to a valid object
> 	previously allocated with either kmalloc() or kmem_cache_alloc()."
>
> It should not be used on order-0 page that is not allocated from slab. No?

I guess we would need to check. Code could exist that does this.

Getting a 0 size would be surprising too here. BUG()? Or WARN() and return
PAGE_SIZE.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-13 10:07       ` Christoph Lameter
@ 2022-07-13 10:33         ` Marco Elver
  2022-07-14  9:15           ` Christoph Lameter
  0 siblings, 1 reply; 49+ messages in thread
From: Marco Elver @ 2022-07-13 10:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hyeonggon Yoo, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox, linux-kernel, linux-mm

On Wed, 13 Jul 2022 at 12:07, Christoph Lameter <cl@gentwo.de> wrote:
>
> On Wed, 13 Jul 2022, Hyeonggon Yoo wrote:
>
> > > Why return 0 if there is an error and why bother the callers with these
> > > checks. BUG()?
> >
> > I thought BUG should be used when there is no other solution.
>
> Spurios returns of 0 that the caller has to check for is a solution?

It's never really been about the caller checking for 0, see below.

> > > I guess this is an error since the order-0 page cannot come from slab
> > > allocations.
> >
> > comment in ksize() says:
> >       "The caller must guarantee that objp points to a valid object
> >       previously allocated with either kmalloc() or kmem_cache_alloc()."
> >
> > It should not be used on order-0 page that is not allocated from slab. No?
>
> I guess we would need to check. Code could exist that does this.
>
> Getting a 0 size would be surprising too here. BUG()? Or WARN() and return
> PAGE_SIZE.

We shouldn't crash, so it should be WARN(), but also returning
PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
the buggy code cause less harm to the rest of the kernel.

From [1]:

> Similarly, if you are able to tell if the passed pointer is not a
> valid object some other way, you can do something better - namely,
> return 0. The intuition here is that the caller has a pointer to an
> invalid object, and wants to use ksize() to determine its size, and
> most likely access all those bytes. Arguably, at that point the kernel
> is already in a degrading state. But we can try to not let things get
> worse by having ksize() return 0, in the hopes that it will stop
> corrupting more memory. It won't work in all cases, but should avoid
> things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> bounds the memory accessed corrupting random memory.

[1] https://lore.kernel.org/all/CANpmjNNYt9AG8RrGF0pq2dPbFc=vw2kaOnL2k5+8kfJeEMGuwg@mail.gmail.com/

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-13 10:33         ` Marco Elver
@ 2022-07-14  9:15           ` Christoph Lameter
  2022-07-14 10:30             ` Marco Elver
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Lameter @ 2022-07-14  9:15 UTC (permalink / raw)
  To: Marco Elver
  Cc: Hyeonggon Yoo, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox, linux-kernel, linux-mm

On Wed, 13 Jul 2022, Marco Elver wrote:

> We shouldn't crash, so it should be WARN(), but also returning
> PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> the buggy code cause less harm to the rest of the kernel.
>
> >From [1]:
>
> > Similarly, if you are able to tell if the passed pointer is not a
> > valid object some other way, you can do something better - namely,
> > return 0. The intuition here is that the caller has a pointer to an
> > invalid object, and wants to use ksize() to determine its size, and
> > most likely access all those bytes. Arguably, at that point the kernel
> > is already in a degrading state. But we can try to not let things get
> > worse by having ksize() return 0, in the hopes that it will stop
> > corrupting more memory. It won't work in all cases, but should avoid
> > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > bounds the memory accessed corrupting random memory.

"in the hopes that it will stop corrupting memory"!!!???

Do a BUG() then and definitely stop all chances of memory corruption.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-14  9:15           ` Christoph Lameter
@ 2022-07-14 10:30             ` Marco Elver
  2022-07-20 10:05               ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Marco Elver @ 2022-07-14 10:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hyeonggon Yoo, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox, linux-kernel, linux-mm

On Thu, 14 Jul 2022 at 11:16, Christoph Lameter <cl@gentwo.de> wrote:
>
> On Wed, 13 Jul 2022, Marco Elver wrote:
>
> > We shouldn't crash, so it should be WARN(), but also returning
> > PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> > the buggy code cause less harm to the rest of the kernel.
> >
> > >From [1]:
> >
> > > Similarly, if you are able to tell if the passed pointer is not a
> > > valid object some other way, you can do something better - namely,
> > > return 0. The intuition here is that the caller has a pointer to an
> > > invalid object, and wants to use ksize() to determine its size, and
> > > most likely access all those bytes. Arguably, at that point the kernel
> > > is already in a degrading state. But we can try to not let things get
> > > worse by having ksize() return 0, in the hopes that it will stop
> > > corrupting more memory. It won't work in all cases, but should avoid
> > > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > > bounds the memory accessed corrupting random memory.
>
> "in the hopes that it will stop corrupting memory"!!!???
>
> Do a BUG() then and definitely stop all chances of memory corruption.

Fair enough.

Well, I'd also prefer to just kill the kernel. But some people don't
like that and want the option to continue running. So a WARN() gives
that option, and just have to boot the kernel with panic_on_warn to
kill it. There are other warnings in the kernel where we'd better kill
the kernel as the chances of corrupting memory are pretty damn high if
we hit them. And I still don't quite see why the case here is any more
or less special.

If the situation here is exceedingly rare, let's try BUG() and see what breaks?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-14 10:30             ` Marco Elver
@ 2022-07-20 10:05               ` Hyeonggon Yoo
  0 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-07-20 10:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox, linux-kernel, linux-mm

On Thu, Jul 14, 2022 at 12:30:09PM +0200, Marco Elver wrote:
> On Thu, 14 Jul 2022 at 11:16, Christoph Lameter <cl@gentwo.de> wrote:
> >
> > On Wed, 13 Jul 2022, Marco Elver wrote:
> >
> > > We shouldn't crash, so it should be WARN(), but also returning
> > > PAGE_SIZE is bad. The intuition behind returning 0 is to try and make
> > > the buggy code cause less harm to the rest of the kernel.
> > >
> > > >From [1]:
> > >
> > > > Similarly, if you are able to tell if the passed pointer is not a
> > > > valid object some other way, you can do something better - namely,
> > > > return 0. The intuition here is that the caller has a pointer to an
> > > > invalid object, and wants to use ksize() to determine its size, and
> > > > most likely access all those bytes. Arguably, at that point the kernel
> > > > is already in a degrading state. But we can try to not let things get
> > > > worse by having ksize() return 0, in the hopes that it will stop
> > > > corrupting more memory. It won't work in all cases, but should avoid
> > > > things like "s = ksize(obj); touch_all_bytes(obj, s)" where the size
> > > > bounds the memory accessed corrupting random memory.
> >
> > "in the hopes that it will stop corrupting memory"!!!???
> >
> > Do a BUG() then and definitely stop all chances of memory corruption.
> 
> Fair enough.
> 
> Well, I'd also prefer to just kill the kernel. But some people don't
> like that and want the option to continue running. So a WARN() gives
> that option, and just have to boot the kernel with panic_on_warn to
> kill it. There are other warnings in the kernel where we'd better kill
> the kernel as the chances of corrupting memory are pretty damn high if
> we hit them. And I still don't quite see why the case here is any more
> or less special.
> 
> If the situation here is exceedingly rare, let's try BUG() and see what breaks?

Let's try BUG() for both conditions and replace it with WARN() later
if kernel hit those often.

I'll update this patch in next version.

And I have no strong opinion on returning 0, but if kernel hits it a
lot, I think returning 0 would be more safe as you said.

Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node()
  2022-07-12 13:39 ` [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
@ 2022-07-28 14:45   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-28 14:45 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
> mostly do same job. Factor out common code into __do_kmalloc_node().
> 
> Note that this patch also fixes missing kasan_kmalloc() in SLUB's
> __kmalloc_node_track_caller().
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slab.c | 30 +----------------------
>  mm/slub.c | 71 +++++++++++++++----------------------------------------
>  2 files changed, 20 insertions(+), 81 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index da2f6a5dd8fa..ab34727d61b2 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3631,37 +3631,9 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  }
>  #endif
>  
> -/**
> - * __do_kmalloc - allocate memory
> - * @size: how many bytes of memory are required.
> - * @flags: the type of memory to allocate (see kmalloc).
> - * @caller: function caller for debug tracking of the caller
> - *
> - * Return: pointer to the allocated memory or %NULL in case of error
> - */
> -static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
> -					  unsigned long caller)
> -{
> -	struct kmem_cache *cachep;
> -	void *ret;
> -
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> -		return NULL;
> -	cachep = kmalloc_slab(size, flags);
> -	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
> -		return cachep;
> -	ret = slab_alloc(cachep, NULL, flags, size, caller);
> -
> -	ret = kasan_kmalloc(cachep, ret, size, flags);
> -	trace_kmalloc(caller, ret, cachep,
> -		      size, cachep->size, flags);
> -
> -	return ret;
> -}
> -
>  void *__kmalloc(size_t size, gfp_t flags)
>  {
> -	return __do_kmalloc(size, flags, _RET_IP_);
> +	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
>  }
>  EXPORT_SYMBOL(__kmalloc);
>  
> diff --git a/mm/slub.c b/mm/slub.c
> index 7c284535a62b..2ccc473e0ae7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4402,29 +4402,6 @@ static int __init setup_slub_min_objects(char *str)
>  
>  __setup("slub_min_objects=", setup_slub_min_objects);
>  
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> -	struct kmem_cache *s;
> -	void *ret;
> -
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> -		return kmalloc_large(size, flags);
> -
> -	s = kmalloc_slab(size, flags);
> -
> -	if (unlikely(ZERO_OR_NULL_PTR(s)))
> -		return s;
> -
> -	ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
> -
> -	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags);
> -
> -	ret = kasan_kmalloc(s, ret, size, flags);
> -
> -	return ret;
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
>  static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  {
>  	struct page *page;
> @@ -4442,7 +4419,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  	return kmalloc_large_node_hook(ptr, size, flags);
>  }
>  
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +static __always_inline
> +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>  {
>  	struct kmem_cache *s;
>  	void *ret;
> @@ -4450,7 +4428,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
>  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
>  		ret = kmalloc_large_node(size, flags, node);
>  
> -		trace_kmalloc_node(_RET_IP_, ret, NULL,
> +		trace_kmalloc_node(caller, ret, NULL,
>  				   size, PAGE_SIZE << get_order(size),
>  				   flags, node);
>  
> @@ -4462,16 +4440,28 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
>  	if (unlikely(ZERO_OR_NULL_PTR(s)))
>  		return s;
>  
> -	ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
> +	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
>  
> -	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, flags, node);
> +	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
>  
>  	ret = kasan_kmalloc(s, ret, size, flags);
>  
>  	return ret;
>  }
> +
> +void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	return __do_kmalloc_node(size, flags, node, _RET_IP_);
> +}
>  EXPORT_SYMBOL(__kmalloc_node);
>  
> +void *__kmalloc(size_t size, gfp_t flags)
> +{
> +	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc);
> +
> +
>  #ifdef CONFIG_HARDENED_USERCOPY
>  /*
>   * Rejects incorrectly sized objects and objects that are to be copied
> @@ -4905,32 +4895,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
>  }
>  
>  void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> -					int node, unsigned long caller)
> +				  int node, unsigned long caller)
>  {
> -	struct kmem_cache *s;
> -	void *ret;
> -
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node(size, gfpflags, node);
> -
> -		trace_kmalloc_node(caller, ret, NULL,
> -				   size, PAGE_SIZE << get_order(size),
> -				   gfpflags, node);
> -
> -		return ret;
> -	}
> -
> -	s = kmalloc_slab(size, gfpflags);
> -
> -	if (unlikely(ZERO_OR_NULL_PTR(s)))
> -		return s;
> -
> -	ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
> -
> -	/* Honor the call site pointer we received. */
> -	trace_kmalloc_node(caller, ret, s, size, s->size, gfpflags, node);
> -
> -	return ret;
> +	return __do_kmalloc_node(size, gfpflags, node, caller);
>  }
>  EXPORT_SYMBOL(__kmalloc_node_track_caller);
>  


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-07-12 13:39 ` [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
@ 2022-07-28 15:23   ` Vlastimil Babka
  2022-08-01 13:26     ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-28 15:23 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> There is no caller of kmalloc_order_trace() except kmalloc_large().
> Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
> 
> Also add tracepoint in kmalloc_large() that was previously
> in kmalloc_order_trace().
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Hmm noticed a small change in how we call trace_kmalloc() which will now
include the __GFP_COMP. I think we could just call alloc_pages() from
kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
__GFP_COMP; first. AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
will filter it out anyway.

> ---
>  include/linux/slab.h | 22 ++--------------------
>  mm/slab_common.c     | 14 +++-----------
>  2 files changed, 5 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index a0e57df3d5a4..15a4c59da59e 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
>  }
>  #endif /* CONFIG_TRACING */
>  
> -extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
> -									 __alloc_size(1);
> -
> -#ifdef CONFIG_TRACING
> -extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> -				__assume_page_alignment __alloc_size(1);
> -#else
> -static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
> -								 unsigned int order)
> -{
> -	return kmalloc_order(size, flags, order);
> -}
> -#endif
> -
> -static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
> -{
> -	unsigned int order = get_order(size);
> -	return kmalloc_order_trace(size, flags, order);
> -}
> -
> +void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
> +					      __alloc_size(1);
>  /**
>   * kmalloc - allocate memory
>   * @size: how many bytes of memory are required.
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6c9aac5d8f4a..1f8af2106df0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
>   * directly to the page allocator. We use __GFP_COMP, because we will need to
>   * know the allocation order to free the pages properly in kfree.
>   */
> -void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> +void *kmalloc_large(size_t size, gfp_t flags)
>  {
>  	void *ret = NULL;
>  	struct page *page;
> +	unsigned int order = get_order(size);
>  
>  	if (unlikely(flags & GFP_SLAB_BUG_MASK))
>  		flags = kmalloc_fix_flags(flags);
> @@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
>  	ret = kasan_kmalloc_large(ret, size, flags);
>  	/* As ret might get tagged, call kmemleak hook after KASAN. */
>  	kmemleak_alloc(ret, size, 1, flags);
> -	return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_order);
> -
> -#ifdef CONFIG_TRACING
> -void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> -{
> -	void *ret = kmalloc_order(size, flags, order);
>  	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
>  	return ret;
>  }
> -EXPORT_SYMBOL(kmalloc_order_trace);
> -#endif
> +EXPORT_SYMBOL(kmalloc_large);
>  
>  #ifdef CONFIG_SLAB_FREELIST_RANDOM
>  /* Randomize a generic freelist */


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-07-12 13:39 ` [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
@ 2022-07-28 16:09   ` Vlastimil Babka
  2022-08-01 14:37     ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-28 16:09 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Now that kmalloc_large_node() is in common code, pass large requests
> to page allocator in kmalloc_node() using kmalloc_large_node().
> 
> One problem is that currently there is no tracepoint in
> kmalloc_large_node(). Instead of simply putting tracepoint in it,
> use kmalloc_large_node{,_notrace} depending on its caller to show
> useful address for both inlined kmalloc_node() and
> __kmalloc_node_track_caller() when large objects are allocated.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nit below:

> ---
>  v3:
> 	This patch is new in v3 and this avoids
> 	missing caller in __kmalloc_large_node_track_caller()
> 	when kmalloc_large_node() is called.
> 
>  include/linux/slab.h | 26 +++++++++++++++++++-------
>  mm/slab.h            |  2 ++
>  mm/slab_common.c     | 17 ++++++++++++++++-
>  mm/slub.c            |  2 +-
>  4 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 082499306098..fd2e129fc813 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>  	return __kmalloc(size, flags);
>  }
>  
> +#ifndef CONFIG_SLOB
>  static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
>  {
> -#ifndef CONFIG_SLOB
> -	if (__builtin_constant_p(size) &&
> -		size <= KMALLOC_MAX_CACHE_SIZE) {
> -		unsigned int i = kmalloc_index(size);
> +	if (__builtin_constant_p(size)) {
> +		unsigned int index;
>  
> -		if (!i)
> +		if (size > KMALLOC_MAX_CACHE_SIZE)
> +			return kmalloc_large_node(size, flags, node);
> +
> +		index = kmalloc_index(size);
> +
> +		if (!index)
>  			return ZERO_SIZE_PTR;
>  
>  		return kmem_cache_alloc_node_trace(
> -				kmalloc_caches[kmalloc_type(flags)][i],
> +				kmalloc_caches[kmalloc_type(flags)][index],
>  						flags, node, size);
>  	}
> -#endif
>  	return __kmalloc_node(size, flags, node);
>  }
> +#else
> +static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
> +		return kmalloc_large_node(size, flags, node);
> +
> +	return __kmalloc_node(size, flags, node);
> +}
> +#endif
>  
>  /**
>   * kmalloc_array - allocate memory for an array.
> diff --git a/mm/slab.h b/mm/slab.h
> index a8d5eb1c323f..7cb51ff44f0c 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
>  
>  /* Find the kmalloc slab corresponding for a certain size */
>  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
> +
> +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
>  #endif
>  
>  gfp_t kmalloc_fix_flags(gfp_t flags);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6f855587b635..dc872e0ef0fc 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -956,7 +956,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
>  }
>  EXPORT_SYMBOL(kmalloc_large);
>  
> -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> +static __always_inline

I don't think we need to inline, compiler should be able to make
kmalloc_large_node(_notrace) quite efficient anyway.

> +void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
>  {
>  	struct page *page;
>  	void *ptr = NULL;
> @@ -976,6 +977,20 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  
>  	return ptr;
>  }
> +
> +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> +{
> +	return __kmalloc_large_node_notrace(size, flags, node);
> +}
> +
> +void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> +{
> +	void *ret = __kmalloc_large_node_notrace(size, flags, node);
> +
> +	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
> +			   PAGE_SIZE << get_order(size), flags, node);
> +	return ret;
> +}
>  EXPORT_SYMBOL(kmalloc_large_node);
>  
>  #ifdef CONFIG_SLAB_FREELIST_RANDOM
> diff --git a/mm/slub.c b/mm/slub.c
> index f22a84dd27de..3d02cf44adf7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4401,7 +4401,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>  	void *ret;
>  
>  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node(size, flags, node);
> +		ret = kmalloc_large_node_notrace(size, flags, node);
>  
>  		trace_kmalloc_node(caller, ret, NULL,
>  				   size, PAGE_SIZE << get_order(size),


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large()
  2022-07-12 13:39 ` [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
@ 2022-07-28 16:13   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-28 16:13 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
> make kmalloc_large() wrapper of __kmalloc_large_node_notrace().
> 
> In the meantime, add missing flag fix code in
> __kmalloc_large_node_notrace().
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slab_common.c | 36 +++++++++++++-----------------------
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index dc872e0ef0fc..9c46e2f9589f 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -932,29 +932,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
>   * directly to the page allocator. We use __GFP_COMP, because we will need to
>   * know the allocation order to free the pages properly in kfree.
>   */
> -void *kmalloc_large(size_t size, gfp_t flags)
> -{
> -	void *ret = NULL;
> -	struct page *page;
> -	unsigned int order = get_order(size);
> -
> -	if (unlikely(flags & GFP_SLAB_BUG_MASK))
> -		flags = kmalloc_fix_flags(flags);
> -
> -	flags |= __GFP_COMP;
> -	page = alloc_pages(flags, order);
> -	if (likely(page)) {
> -		ret = page_address(page);
> -		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
> -				      PAGE_SIZE << order);
> -	}
> -	ret = kasan_kmalloc_large(ret, size, flags);
> -	/* As ret might get tagged, call kmemleak hook after KASAN. */
> -	kmemleak_alloc(ret, size, 1, flags);
> -	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
> -	return ret;
> -}
> -EXPORT_SYMBOL(kmalloc_large);
>  
>  static __always_inline
>  void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> @@ -963,6 +940,9 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
>  	void *ptr = NULL;
>  	unsigned int order = get_order(size);
>  
> +	if (unlikely(flags & GFP_SLAB_BUG_MASK))
> +		flags = kmalloc_fix_flags(flags);
> +
>  	flags |= __GFP_COMP;
>  	page = alloc_pages_node(node, flags, order);
>  	if (page) {
> @@ -978,6 +958,16 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
>  	return ptr;
>  }
>  
> +void *kmalloc_large(size_t size, gfp_t flags)
> +{
> +	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
> +
> +	trace_kmalloc(_RET_IP_, ret, NULL, size,
> +		      PAGE_SIZE << get_order(size), flags);
> +	return ret;
> +}
> +EXPORT_SYMBOL(kmalloc_large);
> +
>  void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
>  {
>  	return __kmalloc_large_node_notrace(size, flags, node);


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-07-12 13:39 ` [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
@ 2022-07-28 16:25   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-28 16:25 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> There is not much benefit for serving large objects in kmalloc().
> Let's pass large requests to page allocator like SLUB for better
> maintenance of common code.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
> v2->v3: 
> 	This patch is basically same but it now depends on
> 	kmalloc_large_node_notrace().
> 
>  include/linux/slab.h | 23 ++++-------------
>  mm/slab.c            | 60 +++++++++++++++++++++++++++++++-------------
>  mm/slab.h            |  3 +++
>  mm/slab_common.c     | 25 ++++++++++++------
>  mm/slub.c            | 19 --------------
>  5 files changed, 68 insertions(+), 62 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index fd2e129fc813..4ee5b2fed164 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void)
>  
>  #ifdef CONFIG_SLAB
>  /*
> - * The largest kmalloc size supported by the SLAB allocators is
> - * 32 megabyte (2^25) or the maximum allocatable page order if that is
> - * less than 32 MB.
> - *
> - * WARNING: Its not easy to increase this value since the allocators have
> - * to do various tricks to work around compiler limitations in order to
> - * ensure proper constant folding.
> + * SLAB and SLUB directly allocates requests fitting in to an order-1 page
> + * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
>   */
> -#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
> -				(MAX_ORDER + PAGE_SHIFT - 1) : 25)
> -#define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
> +#define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
> +#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
>  #ifndef KMALLOC_SHIFT_LOW
>  #define KMALLOC_SHIFT_LOW	5
>  #endif
>  #endif
>  
>  #ifdef CONFIG_SLUB
> -/*
> - * SLUB directly allocates requests fitting in to an order-1 page
> - * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
> - */
>  #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
>  #define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
>  #ifndef KMALLOC_SHIFT_LOW
> @@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>  	if (size <= 512 * 1024) return 19;
>  	if (size <= 1024 * 1024) return 20;
>  	if (size <=  2 * 1024 * 1024) return 21;
> -	if (size <=  4 * 1024 * 1024) return 22;
> -	if (size <=  8 * 1024 * 1024) return 23;
> -	if (size <=  16 * 1024 * 1024) return 24;
> -	if (size <=  32 * 1024 * 1024) return 25;
>  
>  	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>  		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
> @@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>  	/* Will never be reached. Needed because the compiler may complain */
>  	return -1;
>  }
> +static_assert(PAGE_SHIFT <= 20);
>  #define kmalloc_index(s) __kmalloc_index(s, true)
>  #endif /* !CONFIG_SLOB */
>  
> diff --git a/mm/slab.c b/mm/slab.c
> index ab34727d61b2..a2f43425a0ae 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3585,11 +3585,19 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>  	struct kmem_cache *cachep;
>  	void *ret;
>  
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
> -		return NULL;
> +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> +		ret = kmalloc_large_node_notrace(size, flags, node);
> +
> +		trace_kmalloc_node(caller, ret, NULL, size,
> +				   PAGE_SIZE << get_order(size),
> +				   flags, node);
> +		return ret;
> +	}
> +
>  	cachep = kmalloc_slab(size, flags);
>  	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
>  		return cachep;
> +
>  	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
>  	ret = kasan_kmalloc(cachep, ret, size, flags);
>  
> @@ -3664,17 +3672,27 @@ EXPORT_SYMBOL(kmem_cache_free);
>  
>  void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
>  {
> -	struct kmem_cache *s;
> -	size_t i;
>  
>  	local_irq_disable();
> -	for (i = 0; i < size; i++) {
> +	for (int i = 0; i < size; i++) {
>  		void *objp = p[i];
> +		struct kmem_cache *s;
>  
> -		if (!orig_s) /* called via kfree_bulk */
> -			s = virt_to_cache(objp);
> -		else
> +		if (!orig_s) {
> +			struct folio *folio = virt_to_folio(objp);
> +
> +			/* called via kfree_bulk */
> +			if (!folio_test_slab(folio)) {
> +				local_irq_enable();
> +				free_large_kmalloc(folio, objp);
> +				local_irq_disable();
> +				continue;
> +			}
> +			s = folio_slab(folio)->slab_cache;
> +		} else {
>  			s = cache_from_obj(orig_s, objp);
> +		}
> +
>  		if (!s)
>  			continue;
>  
> @@ -3703,20 +3721,24 @@ void kfree(const void *objp)
>  {
>  	struct kmem_cache *c;
>  	unsigned long flags;
> +	struct folio *folio;
>  
>  	trace_kfree(_RET_IP_, objp);
>  
>  	if (unlikely(ZERO_OR_NULL_PTR(objp)))
>  		return;
> -	local_irq_save(flags);
> -	kfree_debugcheck(objp);
> -	c = virt_to_cache(objp);
> -	if (!c) {
> -		local_irq_restore(flags);
> +
> +	folio = virt_to_folio(objp);
> +	if (!folio_test_slab(folio)) {
> +		free_large_kmalloc(folio, (void *)objp);
>  		return;
>  	}
> -	debug_check_no_locks_freed(objp, c->object_size);
>  
> +	c = folio_slab(folio)->slab_cache;
> +
> +	local_irq_save(flags);
> +	kfree_debugcheck(objp);
> +	debug_check_no_locks_freed(objp, c->object_size);
>  	debug_check_no_obj_freed(objp, c->object_size);
>  	__cache_free(c, (void *)objp, _RET_IP_);
>  	local_irq_restore(flags);
> @@ -4138,15 +4160,17 @@ void __check_heap_object(const void *ptr, unsigned long n,
>  size_t __ksize(const void *objp)
>  {
>  	struct kmem_cache *c;
> -	size_t size;
> +	struct folio *folio;
>  
>  	BUG_ON(!objp);
>  	if (unlikely(objp == ZERO_SIZE_PTR))
>  		return 0;
>  
> -	c = virt_to_cache(objp);
> -	size = c ? c->object_size : 0;
> +	folio = virt_to_folio(objp);
> +	if (!folio_test_slab(folio))
> +		return folio_size(folio);
>  
> -	return size;
> +	c = folio_slab(folio)->slab_cache;
> +	return c->object_size;
>  }
>  EXPORT_SYMBOL(__ksize);
> diff --git a/mm/slab.h b/mm/slab.h
> index 7cb51ff44f0c..c81c92d421f1 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -669,6 +669,9 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
>  		print_tracking(cachep, x);
>  	return cachep;
>  }
> +
> +void free_large_kmalloc(struct folio *folio, void *object);
> +
>  #endif /* CONFIG_SLOB */
>  
>  static inline size_t slab_ksize(const struct kmem_cache *s)
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c46e2f9589f..0dff46fb7193 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -771,8 +771,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>  
>  /*
>   * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
> - * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is
> - * kmalloc-32M.
> + * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
> + * kmalloc-2M.
>   */
>  const struct kmalloc_info_struct kmalloc_info[] __initconst = {
>  	INIT_KMALLOC_INFO(0, 0),
> @@ -796,11 +796,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
>  	INIT_KMALLOC_INFO(262144, 256k),
>  	INIT_KMALLOC_INFO(524288, 512k),
>  	INIT_KMALLOC_INFO(1048576, 1M),
> -	INIT_KMALLOC_INFO(2097152, 2M),
> -	INIT_KMALLOC_INFO(4194304, 4M),
> -	INIT_KMALLOC_INFO(8388608, 8M),
> -	INIT_KMALLOC_INFO(16777216, 16M),
> -	INIT_KMALLOC_INFO(33554432, 32M)
> +	INIT_KMALLOC_INFO(2097152, 2M)
>  };
>  
>  /*
> @@ -913,6 +909,21 @@ void __init create_kmalloc_caches(slab_flags_t flags)
>  	/* Kmalloc array is now usable */
>  	slab_state = UP;
>  }
> +
> +void free_large_kmalloc(struct folio *folio, void *object)
> +{
> +	unsigned int order = folio_order(folio);
> +
> +	if (WARN_ON_ONCE(order == 0))
> +		pr_warn_once("object pointer: 0x%p\n", object);
> +
> +	kmemleak_free(object);
> +	kasan_kfree_large(object);
> +
> +	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
> +			      -(PAGE_SIZE << order));
> +	__free_pages(folio_page(folio, 0), order);
> +}
>  #endif /* !CONFIG_SLOB */
>  
>  gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slub.c b/mm/slub.c
> index 3d02cf44adf7..6cb7ca27f3b7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1704,12 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
>   * Hooks for other subsystems that check memory allocations. In a typical
>   * production configuration these hooks all should produce no code at all.
>   */
> -static __always_inline void kfree_hook(void *x)
> -{
> -	kmemleak_free(x);
> -	kasan_kfree_large(x);
> -}
> -
>  static __always_inline bool slab_free_hook(struct kmem_cache *s,
>  						void *x, bool init)
>  {
> @@ -3550,19 +3544,6 @@ struct detached_freelist {
>  	struct kmem_cache *s;
>  };
>  
> -static inline void free_large_kmalloc(struct folio *folio, void *object)
> -{
> -	unsigned int order = folio_order(folio);
> -
> -	if (WARN_ON_ONCE(order == 0))
> -		pr_warn_once("object pointer: 0x%p\n", object);
> -
> -	kfree_hook(object);
> -	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
> -			      -(PAGE_SIZE << order));
> -	__free_pages(folio_page(folio, 0), order);
> -}
> -
>  /*
>   * This function progressively scans the array with free objects (with
>   * a limited look ahead) and extract objects belonging to the same


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint
  2022-07-12 13:39 ` [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
@ 2022-07-29  9:49   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29  9:49 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> To unify kmalloc functions in later patch, introduce common alloc/free
> functions that does not have tracepoint.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
> 
> v3: 
>    Tried to avoid affecting existing functions.
> 
>  mm/slab.c | 36 +++++++++++++++++++++++++++++-------
>  mm/slab.h |  4 ++++
>  mm/slub.c | 13 +++++++++++++
>  3 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index a2f43425a0ae..375e35c14430 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3560,6 +3560,14 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_node);
>  
> +void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
> +			     int nodeid, size_t orig_size,
> +			     unsigned long caller)
> +{
> +	return slab_alloc_node(cachep, NULL, flags, nodeid,
> +			       orig_size, caller);
> +}
> +
>  #ifdef CONFIG_TRACING
>  void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
>  				  gfp_t flags,
> @@ -3645,6 +3653,26 @@ void *__kmalloc(size_t size, gfp_t flags)
>  }
>  EXPORT_SYMBOL(__kmalloc);
>  
> +static __always_inline
> +void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
> +			  unsigned long caller)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	debug_check_no_locks_freed(objp, cachep->object_size);
> +	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
> +		debug_check_no_obj_freed(objp, cachep->object_size);
> +	__cache_free(cachep, objp, caller);
> +	local_irq_restore(flags);
> +}
> +
> +void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
> +		       unsigned long caller)
> +{
> +	__do_kmem_cache_free(cachep, objp, caller);
> +}
> +
>  /**
>   * kmem_cache_free - Deallocate an object
>   * @cachep: The cache the allocation was from.
> @@ -3655,18 +3683,12 @@ EXPORT_SYMBOL(__kmalloc);
>   */
>  void kmem_cache_free(struct kmem_cache *cachep, void *objp)
>  {
> -	unsigned long flags;
>  	cachep = cache_from_obj(cachep, objp);
>  	if (!cachep)
>  		return;
>  
>  	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> -	local_irq_save(flags);
> -	debug_check_no_locks_freed(objp, cachep->object_size);
> -	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
> -		debug_check_no_obj_freed(objp, cachep->object_size);
> -	__cache_free(cachep, objp, _RET_IP_);
> -	local_irq_restore(flags);
> +	__do_kmem_cache_free(cachep, objp, _RET_IP_);
>  }
>  EXPORT_SYMBOL(kmem_cache_free);
>  
> diff --git a/mm/slab.h b/mm/slab.h
> index c81c92d421f1..9193e9c1f040 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -275,6 +275,10 @@ void create_kmalloc_caches(slab_flags_t);
>  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
>  
>  void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
> +void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> +			      int node, size_t orig_size,
> +			      unsigned long caller);
> +void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
>  #endif
>  
>  gfp_t kmalloc_fix_flags(gfp_t flags);
> diff --git a/mm/slub.c b/mm/slub.c
> index 6cb7ca27f3b7..74eb78678c98 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3262,6 +3262,14 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_lru);
>  
> +void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> +			      int node, size_t orig_size,
> +			      unsigned long caller)
> +{
> +	return slab_alloc_node(s, NULL, gfpflags, node,
> +			       caller, orig_size);
> +}
> +
>  #ifdef CONFIG_TRACING
>  void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
>  {
> @@ -3526,6 +3534,11 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
>  }
>  #endif
>  
> +void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
> +{
> +	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
> +}
> +
>  void kmem_cache_free(struct kmem_cache *s, void *x)
>  {
>  	s = cache_from_obj(s, x);


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem
  2022-07-12 13:39 ` [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
@ 2022-07-29 10:25   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 10:25 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Now everything in kmalloc subsystem can be generalized.
> Let's do it!
> 
> Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
> kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
> to slab_common.c.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slab.c        | 108 -----------------------------------------------
>  mm/slab_common.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
>  mm/slub.c        |  87 --------------------------------------
>  3 files changed, 102 insertions(+), 195 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 375e35c14430..6407dad13d5c 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3587,44 +3587,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
>  EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
>  #endif
>  
> -static __always_inline void *
> -__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> -{
> -	struct kmem_cache *cachep;
> -	void *ret;
> -
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node_notrace(size, flags, node);
> -
> -		trace_kmalloc_node(caller, ret, NULL, size,
> -				   PAGE_SIZE << get_order(size),
> -				   flags, node);
> -		return ret;
> -	}
> -
> -	cachep = kmalloc_slab(size, flags);
> -	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
> -		return cachep;
> -
> -	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
> -	ret = kasan_kmalloc(cachep, ret, size, flags);
> -
> -	return ret;
> -}
> -
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> -{
> -	return __do_kmalloc_node(size, flags, node, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc_node);
> -
> -void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> -		int node, unsigned long caller)
> -{
> -	return __do_kmalloc_node(size, flags, node, caller);
> -}
> -EXPORT_SYMBOL(__kmalloc_node_track_caller);
> -
>  #ifdef CONFIG_PRINTK
>  void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  {
> @@ -3647,12 +3609,6 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
>  }
>  #endif
>  
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> -	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
>  static __always_inline
>  void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
>  			  unsigned long caller)
> @@ -3730,43 +3686,6 @@ void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
>  }
>  EXPORT_SYMBOL(kmem_cache_free_bulk);
>  
> -/**
> - * kfree - free previously allocated memory
> - * @objp: pointer returned by kmalloc.
> - *
> - * If @objp is NULL, no operation is performed.
> - *
> - * Don't free memory not originally allocated by kmalloc()
> - * or you will run into trouble.
> - */
> -void kfree(const void *objp)
> -{
> -	struct kmem_cache *c;
> -	unsigned long flags;
> -	struct folio *folio;
> -
> -	trace_kfree(_RET_IP_, objp);
> -
> -	if (unlikely(ZERO_OR_NULL_PTR(objp)))
> -		return;
> -
> -	folio = virt_to_folio(objp);
> -	if (!folio_test_slab(folio)) {
> -		free_large_kmalloc(folio, (void *)objp);
> -		return;
> -	}
> -
> -	c = folio_slab(folio)->slab_cache;
> -
> -	local_irq_save(flags);
> -	kfree_debugcheck(objp);
> -	debug_check_no_locks_freed(objp, c->object_size);
> -	debug_check_no_obj_freed(objp, c->object_size);
> -	__cache_free(c, (void *)objp, _RET_IP_);
> -	local_irq_restore(flags);
> -}
> -EXPORT_SYMBOL(kfree);
> -
>  /*
>   * This initializes kmem_cache_node or resizes various caches for all nodes.
>   */
> @@ -4169,30 +4088,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
>  	usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
>  }
>  #endif /* CONFIG_HARDENED_USERCOPY */
> -
> -/**
> - * __ksize -- Uninstrumented ksize.
> - * @objp: pointer to the object
> - *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> - *
> - * Return: size of the actual memory used by @objp in bytes
> - */
> -size_t __ksize(const void *objp)
> -{
> -	struct kmem_cache *c;
> -	struct folio *folio;
> -
> -	BUG_ON(!objp);
> -	if (unlikely(objp == ZERO_SIZE_PTR))
> -		return 0;
> -
> -	folio = virt_to_folio(objp);
> -	if (!folio_test_slab(folio))
> -		return folio_size(folio);
> -
> -	c = folio_slab(folio)->slab_cache;
> -	return c->object_size;
> -}
> -EXPORT_SYMBOL(__ksize);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0dff46fb7193..1000e05c77df 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -924,6 +924,108 @@ void free_large_kmalloc(struct folio *folio, void *object)
>  			      -(PAGE_SIZE << order));
>  	__free_pages(folio_page(folio, 0), order);
>  }
> +
> +static __always_inline
> +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> +{
> +	struct kmem_cache *s;
> +	void *ret;
> +
> +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> +		ret = kmalloc_large_node_notrace(size, flags, node);
> +		trace_kmalloc_node(caller, ret, NULL,
> +				   size, PAGE_SIZE << get_order(size),
> +				   flags, node);
> +		return ret;
> +	}
> +
> +	s = kmalloc_slab(size, flags);
> +
> +	if (unlikely(ZERO_OR_NULL_PTR(s)))
> +		return s;
> +
> +	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> +	ret = kasan_kmalloc(s, ret, size, flags);
> +	trace_kmalloc_node(caller, ret, s, size,
> +			   s->size, flags, node);
> +	return ret;
> +}
> +
> +void *__kmalloc_node(size_t size, gfp_t flags, int node)
> +{
> +	return __do_kmalloc_node(size, flags, node, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc_node);
> +
> +void *__kmalloc(size_t size, gfp_t flags)
> +{
> +	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> +}
> +EXPORT_SYMBOL(__kmalloc);
> +
> +void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> +				  int node, unsigned long caller)
> +{
> +	return __do_kmalloc_node(size, flags, node, caller);
> +}
> +EXPORT_SYMBOL(__kmalloc_node_track_caller);
> +
> +/**
> + * kfree - free previously allocated memory
> + * @objp: pointer returned by kmalloc.
> + *
> + * If @objp is NULL, no operation is performed.
> + *
> + * Don't free memory not originally allocated by kmalloc()
> + * or you will run into trouble.
> + */
> +void kfree(const void *object)
> +{
> +	struct folio *folio;
> +	struct slab *slab;
> +	struct kmem_cache *s;
> +
> +	trace_kfree(_RET_IP_, object);
> +
> +	if (unlikely(ZERO_OR_NULL_PTR(object)))
> +		return;
> +
> +	folio = virt_to_folio(object);
> +	if (unlikely(!folio_test_slab(folio))) {
> +		free_large_kmalloc(folio, (void *)object);
> +		return;
> +	}
> +
> +	slab = folio_slab(folio);
> +	s = slab->slab_cache;
> +	__kmem_cache_free(s, (void *)object, _RET_IP_);
> +}
> +EXPORT_SYMBOL(kfree);
> +
> +/**
> + * __ksize -- Uninstrumented ksize.
> + * @objp: pointer to the object
> + *
> + * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> + * safety checks as ksize() with KASAN instrumentation enabled.
> + *
> + * Return: size of the actual memory used by @objp in bytes
> + */
> +size_t __ksize(const void *object)
> +{
> +	struct folio *folio;
> +
> +	if (unlikely(object == ZERO_SIZE_PTR))
> +		return 0;
> +
> +	folio = virt_to_folio(object);
> +
> +	if (unlikely(!folio_test_slab(folio)))
> +		return folio_size(folio);
> +
> +	return slab_ksize(folio_slab(folio)->slab_cache);
> +}
> +EXPORT_SYMBOL(__ksize);
>  #endif /* !CONFIG_SLOB */
>  
>  gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slub.c b/mm/slub.c
> index 74eb78678c98..836292c32e58 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4388,49 +4388,6 @@ static int __init setup_slub_min_objects(char *str)
>  
>  __setup("slub_min_objects=", setup_slub_min_objects);
>  
> -static __always_inline
> -void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> -{
> -	struct kmem_cache *s;
> -	void *ret;
> -
> -	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node_notrace(size, flags, node);
> -
> -		trace_kmalloc_node(caller, ret, NULL,
> -				   size, PAGE_SIZE << get_order(size),
> -				   flags, node);
> -
> -		return ret;
> -	}
> -
> -	s = kmalloc_slab(size, flags);
> -
> -	if (unlikely(ZERO_OR_NULL_PTR(s)))
> -		return s;
> -
> -	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
> -
> -	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
> -
> -	ret = kasan_kmalloc(s, ret, size, flags);
> -
> -	return ret;
> -}
> -
> -void *__kmalloc_node(size_t size, gfp_t flags, int node)
> -{
> -	return __do_kmalloc_node(size, flags, node, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc_node);
> -
> -void *__kmalloc(size_t size, gfp_t flags)
> -{
> -	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
> -}
> -EXPORT_SYMBOL(__kmalloc);
> -
> -
>  #ifdef CONFIG_HARDENED_USERCOPY
>  /*
>   * Rejects incorrectly sized objects and objects that are to be copied
> @@ -4481,43 +4438,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
>  }
>  #endif /* CONFIG_HARDENED_USERCOPY */
>  
> -size_t __ksize(const void *object)
> -{
> -	struct folio *folio;
> -
> -	if (unlikely(object == ZERO_SIZE_PTR))
> -		return 0;
> -
> -	folio = virt_to_folio(object);
> -
> -	if (unlikely(!folio_test_slab(folio)))
> -		return folio_size(folio);
> -
> -	return slab_ksize(folio_slab(folio)->slab_cache);
> -}
> -EXPORT_SYMBOL(__ksize);
> -
> -void kfree(const void *x)
> -{
> -	struct folio *folio;
> -	struct slab *slab;
> -	void *object = (void *)x;
> -
> -	trace_kfree(_RET_IP_, x);
> -
> -	if (unlikely(ZERO_OR_NULL_PTR(x)))
> -		return;
> -
> -	folio = virt_to_folio(x);
> -	if (unlikely(!folio_test_slab(folio))) {
> -		free_large_kmalloc(folio, object);
> -		return;
> -	}
> -	slab = folio_slab(folio);
> -	slab_free(slab->slab_cache, slab, object, NULL, &object, 1, _RET_IP_);
> -}
> -EXPORT_SYMBOL(kfree);
> -
>  #define SHRINK_PROMOTE_MAX 32
>  
>  /*
> @@ -4863,13 +4783,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
>  	return 0;
>  }
>  
> -void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> -				  int node, unsigned long caller)
> -{
> -	return __do_kmalloc_node(size, gfpflags, node, caller);
> -}
> -EXPORT_SYMBOL(__kmalloc_node_track_caller);
> -
>  #ifdef CONFIG_SYSFS
>  static int count_inuse(struct slab *slab)
>  {


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints
  2022-07-12 13:39 ` [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
@ 2022-07-29 10:52   ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 10:52 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
> remove _node postfix for NUMA version of tracepoints.
> 
> This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
> but at this point maintaining both kmem_alloc and kmem_alloc_node
> event classes does not makes sense at all.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
  2022-07-12 13:39 ` [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
@ 2022-07-29 11:23   ` Vlastimil Babka
  2022-08-02  9:22     ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 11:23 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
> using TRACE_EVENT() macro.
> 
> And then this patch does:
>    - Do not pass pointer to struct kmem_cache to trace_kmalloc.
>      gfp flag is enough to know if it's accounted or not.
>    - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
>    - Avoid dereferencing s->name in when not using kmem_cache_free event.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

With some comments:

> ---
>  include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
>  mm/slab.c                   | 16 ++++------
>  mm/slab_common.c            | 17 ++++++----
>  mm/slob.c                   | 16 +++-------
>  mm/slub.c                   | 13 +++-----
>  5 files changed, 70 insertions(+), 56 deletions(-)
> 
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index e078ebcdc4b1..0ded2c351062 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -9,17 +9,15 @@
>  #include <linux/tracepoint.h>
>  #include <trace/events/mmflags.h>
>  
> -DECLARE_EVENT_CLASS(kmem_alloc,
> +TRACE_EVENT(kmem_cache_alloc,
>  
>  	TP_PROTO(unsigned long call_site,
>  		 const void *ptr,
>  		 struct kmem_cache *s,
> -		 size_t bytes_req,
> -		 size_t bytes_alloc,
>  		 gfp_t gfp_flags,
>  		 int node),
>  
> -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
> +	TP_ARGS(call_site, ptr, s, gfp_flags, node),
>  
>  	TP_STRUCT__entry(
>  		__field(	unsigned long,	call_site	)
> @@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
>  	TP_fast_assign(
>  		__entry->call_site	= call_site;
>  		__entry->ptr		= ptr;
> -		__entry->bytes_req	= bytes_req;
> -		__entry->bytes_alloc	= bytes_alloc;
> +		__entry->bytes_req	= s->object_size;
> +		__entry->bytes_alloc	= s->size;
>  		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
>  		__entry->node		= node;
>  		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?

This assignment continues by ...
	(s && s->flags & SLAB_ACCOUNT)) : false;

We could now remove the "s &&" as we assume it's non-NULL anyway.

> @@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
>  		__entry->accounted ? "true" : "false")
>  );
>  
> -DEFINE_EVENT(kmem_alloc, kmalloc,
> +TRACE_EVENT(kmalloc,
>  
> -	TP_PROTO(unsigned long call_site, const void *ptr,
> -		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> -		 gfp_t gfp_flags, int node),
> +	TP_PROTO(unsigned long call_site,
> +		 const void *ptr,
> +		 size_t bytes_req,
> +		 size_t bytes_alloc,
> +		 gfp_t gfp_flags,
> +		 int node),
>  
> -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> -);
> +	TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
>  
> -DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	call_site	)
> +		__field(	const void *,	ptr		)
> +		__field(	size_t,		bytes_req	)
> +		__field(	size_t,		bytes_alloc	)
> +		__field(	unsigned long,	gfp_flags	)
> +		__field(	int,		node		)
> +		__field(	bool,		accounted	)
> +	),
>  
> -	TP_PROTO(unsigned long call_site, const void *ptr,
> -		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> -		 gfp_t gfp_flags, int node),
> +	TP_fast_assign(
> +		__entry->call_site	= call_site;
> +		__entry->ptr		= ptr;
> +		__entry->bytes_req	= bytes_req;
> +		__entry->bytes_alloc	= bytes_alloc;
> +		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
> +		__entry->node		= node;
> +		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
> +					  (gfp_flags & __GFP_ACCOUNT) : false;

This test could be perhaps moved to TP_printk and the accounted field
removed completely, but we need to make sure trace-cmd (which reads the
binary entries and does the printk in userspace) can interpret the tests
properly, including the IS_ENABLED(CONFIG_MEMCG_KMEM) part.

> +	),
>  
> -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> +	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
> +		(void *)__entry->call_site,
> +		__entry->ptr,
> +		__entry->bytes_req,
> +		__entry->bytes_alloc,
> +		show_gfp_flags(__entry->gfp_flags),
> +		__entry->node,
> +		__entry->accounted ? "true" : "false")
>  );
>  
>  TRACE_EVENT(kfree,
> @@ -93,20 +115,20 @@ TRACE_EVENT(kfree,
>  
>  TRACE_EVENT(kmem_cache_free,
>  
> -	TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
> +	TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
>  
> -	TP_ARGS(call_site, ptr, name),
> +	TP_ARGS(call_site, ptr, s),
>  
>  	TP_STRUCT__entry(
>  		__field(	unsigned long,	call_site	)
>  		__field(	const void *,	ptr		)
> -		__string(	name,	name	)
> +		__string(	name,		s->name		)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->call_site	= call_site;
>  		__entry->ptr		= ptr;
> -		__assign_str(name, name);
> +		__assign_str(name, s->name);
>  	),
>  
>  	TP_printk("call_site=%pS ptr=%p name=%s",
> diff --git a/mm/slab.c b/mm/slab.c
> index f9b74831e3f4..1685e5507a59 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
>  {
>  	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
>  
> -	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
> -			       cachep->size, flags, NUMA_NO_NODE);
> +	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
>  
>  	return ret;
>  }
> @@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
>  	ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
>  
>  	ret = kasan_kmalloc(cachep, ret, size, flags);
> -	trace_kmalloc(_RET_IP_, ret, cachep,
> -		      size, cachep->size, flags, NUMA_NO_NODE);
> +	trace_kmalloc(_RET_IP_, ret,
> +		      size, cachep->size,
> +		      flags, NUMA_NO_NODE);

This could now fit on 2, maybe even 1 line?
Other trace_* calls could be perhaps also reduced, please check.

>  	return ret;
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_trace);
> @@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
>  {
>  	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
>  
> -	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
> -			       cachep->object_size, cachep->size,
> -			       flags, nodeid);
> +	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
>  
>  	return ret;
>  }
> @@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
>  	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
>  
>  	ret = kasan_kmalloc(cachep, ret, size, flags);
> -	trace_kmalloc(_RET_IP_, ret, cachep,
> +	trace_kmalloc(_RET_IP_, ret,
>  		      size, cachep->size,
>  		      flags, nodeid);
>  	return ret;
> @@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
>  	if (!cachep)
>  		return;
>  
> -	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> +	trace_kmem_cache_free(_RET_IP_, objp, cachep);
>  	__do_kmem_cache_free(cachep, objp, _RET_IP_);
>  }
>  EXPORT_SYMBOL(kmem_cache_free);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0e66b4911ebf..c01c6b8f0d34 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>  
>  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
>  		ret = kmalloc_large_node_notrace(size, flags, node);
> -		trace_kmalloc(_RET_IP_, ret, NULL,
> +		trace_kmalloc(_RET_IP_, ret,
>  			      size, PAGE_SIZE << get_order(size),
>  			      flags, node);
>  		return ret;
> @@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>  
>  	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
>  	ret = kasan_kmalloc(s, ret, size, flags);
> -	trace_kmalloc(_RET_IP_, ret, s, size,
> -		      s->size, flags, node);
> +	trace_kmalloc(_RET_IP_, ret,
> +		      size, s->size,
> +		      flags, node);
>  	return ret;
>  }
>  
> @@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
>  {
>  	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
>  
> -	trace_kmalloc(_RET_IP_, ret, NULL, size,
> -		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
> +	trace_kmalloc(_RET_IP_, ret,
> +		      size, PAGE_SIZE << get_order(size),
> +		      flags, NUMA_NO_NODE);
>  	return ret;
>  }
>  EXPORT_SYMBOL(kmalloc_large);
> @@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  {
>  	void *ret = __kmalloc_large_node_notrace(size, flags, node);
>  
> -	trace_kmalloc(_RET_IP_, ret, NULL, size,
> -		      PAGE_SIZE << get_order(size), flags, node);
> +	trace_kmalloc(_RET_IP_, ret,
> +		      size, PAGE_SIZE << get_order(size),
> +		      flags, node);
>  	return ret;
>  }
>  EXPORT_SYMBOL(kmalloc_large_node);
> diff --git a/mm/slob.c b/mm/slob.c
> index a4d50d957c25..97a4d2407f96 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
>  		*m = size;
>  		ret = (void *)m + minalign;
>  
> -		trace_kmalloc(caller, ret, NULL,
> -			      size, size + minalign, gfp, node);
> +		trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
>  	} else {
>  		unsigned int order = get_order(size);
>  
> @@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
>  			gfp |= __GFP_COMP;
>  		ret = slob_new_pages(gfp, order, node);
>  
> -		trace_kmalloc(caller, ret, NULL,
> -			      size, PAGE_SIZE << order, gfp, node);
> +		trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
>  	}
>  
>  	kmemleak_alloc(ret, size, 1, gfp);
> @@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
>  
>  	if (c->size < PAGE_SIZE) {
>  		b = slob_alloc(c->size, flags, c->align, node, 0);
> -		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> -				       SLOB_UNITS(c->size) * SLOB_UNIT,

Hm we lose the SLOB_UNITS() adjustment here. Maybe we don't care that much,
maybe simply c->size could be already stored with SLOB_UNITS() (but didn't
check if other stuff breaks with that).

Thanks!

> -				       flags, node);
> +		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
>  	} else {
>  		b = slob_new_pages(flags, get_order(c->size), node);
> -		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> -				       PAGE_SIZE << get_order(c->size),
> -				       flags, node);
> +		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
>  	}
>  
>  	if (b && c->ctor) {
> @@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
>  void kmem_cache_free(struct kmem_cache *c, void *b)
>  {
>  	kmemleak_free_recursive(b, c->flags);
> -	trace_kmem_cache_free(_RET_IP_, b, c->name);
> +	trace_kmem_cache_free(_RET_IP_, b, c);
>  	if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
>  		struct slob_rcu *slob_rcu;
>  		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
> diff --git a/mm/slub.c b/mm/slub.c
> index f1aa51480dc4..a49c69469c64 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
>  {
>  	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
>  
> -	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
> -				s->size, gfpflags, NUMA_NO_NODE);
> +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
>  
>  	return ret;
>  }
> @@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
>  void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
>  {
>  	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
> -	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
> +	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
>  	ret = kasan_kmalloc(s, ret, size, gfpflags);
>  	return ret;
>  }
> @@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
>  {
>  	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
>  
> -	trace_kmem_cache_alloc(_RET_IP_, ret, s,
> -			       s->object_size, s->size, gfpflags, node);
> +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
>  
>  	return ret;
>  }
> @@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
>  {
>  	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
>  
> -	trace_kmalloc(_RET_IP_, ret, s,
> -		      size, s->size, gfpflags, node);
> +	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
>  
>  	ret = kasan_kmalloc(s, ret, size, gfpflags);
>  	return ret;
> @@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
>  	s = cache_from_obj(s, x);
>  	if (!s)
>  		return;
> -	trace_kmem_cache_free(_RET_IP_, x, s->name);
> +	trace_kmem_cache_free(_RET_IP_, x, s);
>  	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
>  }
>  EXPORT_SYMBOL(kmem_cache_free);


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h
  2022-07-12 13:39 ` [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h Hyeonggon Yoo
@ 2022-07-29 11:47   ` Vlastimil Babka
  2022-08-02  9:25     ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 11:47 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __ksize() is only called by KASAN. Remove export symbol and move
> definition to mm/slab.h as we don't want to grow its callers.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nit: The subject and changelog should probably say declaration, not definition.

> ---
>  include/linux/slab.h |  1 -
>  mm/slab.h            |  2 ++
>  mm/slab_common.c     | 11 +----------
>  mm/slob.c            |  1 -
>  4 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 4ee5b2fed164..701fe538650f 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
>  void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
>  void kfree(const void *objp);
>  void kfree_sensitive(const void *objp);
> -size_t __ksize(const void *objp);
>  size_t ksize(const void *objp);
>  #ifdef CONFIG_PRINTK
>  bool kmem_valid_obj(void *object);
> diff --git a/mm/slab.h b/mm/slab.h
> index 9193e9c1f040..ad634e02b3cb 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
>  
>  #endif /* CONFIG_SLOB */
>  
> +size_t __ksize(const void *objp);
> +
>  static inline size_t slab_ksize(const struct kmem_cache *s)
>  {
>  #ifndef CONFIG_SLUB
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index c01c6b8f0d34..1f8db7959366 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1003,15 +1003,7 @@ void kfree(const void *object)
>  }
>  EXPORT_SYMBOL(kfree);
>  
> -/**
> - * __ksize -- Uninstrumented ksize.
> - * @objp: pointer to the object
> - *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> - *
> - * Return: size of the actual memory used by @objp in bytes
> - */
> +/* Uninstrumented ksize. Only called by KASAN. */
>  size_t __ksize(const void *object)
>  {
>  	struct folio *folio;
> @@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)
>  
>  	return slab_ksize(folio_slab(folio)->slab_cache);
>  }
> -EXPORT_SYMBOL(__ksize);
>  #endif /* !CONFIG_SLOB */
>  
>  gfp_t kmalloc_fix_flags(gfp_t flags)
> diff --git a/mm/slob.c b/mm/slob.c
> index 97a4d2407f96..91d6e2b19929 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -584,7 +584,6 @@ size_t __ksize(const void *block)
>  	m = (unsigned int *)(block - align);
>  	return SLOB_UNITS(*m) * SLOB_UNIT;
>  }
> -EXPORT_SYMBOL(__ksize);
>  
>  int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
>  {


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize()
  2022-07-12 13:39 ` [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
  2022-07-12 15:13   ` Christoph Lameter
@ 2022-07-29 11:50   ` Vlastimil Babka
  1 sibling, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 11:50 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox, Marco Elver
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> __ksize() returns size of objects allocated from slab allocator.
> When invalid object is passed to __ksize(), returning zero
> prevents further memory corruption and makes caller be able to
> check if there is an error.
> 
> If address of large object is not beginning of folio or size of
> the folio is too small, it must be invalid. Return zero in such cases.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

If you want to change it to BUG() I won't object, no strong opinion.

> ---
>  mm/slab_common.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1f8db7959366..0d6cbe9d7ad0 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1013,8 +1013,12 @@ size_t __ksize(const void *object)
>  
>  	folio = virt_to_folio(object);
>  
> -	if (unlikely(!folio_test_slab(folio)))
> +	if (unlikely(!folio_test_slab(folio))) {
> +		if (WARN_ON(object != folio_address(folio) ||
> +				folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
> +			return 0;
>  		return folio_size(folio);
> +	}
>  
>  	return slab_ksize(folio_slab(folio)->slab_cache);
>  }


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 00/15] common kmalloc v3
  2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
                   ` (15 preceding siblings ...)
  2022-07-12 13:39 ` [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
@ 2022-07-29 15:08 ` Vlastimil Babka
  2022-08-14 10:06   ` Hyeonggon Yoo
  16 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-07-29 15:08 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Joe Perches,
	Vasily Averin, Matthew WilCox
  Cc: linux-kernel, linux-mm

On 7/12/22 15:39, Hyeonggon Yoo wrote:
> This is v3 of common kmalloc series.
> 
> This series generalize most of kmalloc functions and move its
> implementation into mm/slab_common.c.
> 
> I believe this series give better maintainability of code for SLAB and SLUB.
> Please consider applying.
> 
> This series is based on slab/for-next and also available at
> https://github.com/hygoni/linux/tree/slab-common-v3r0
> 
> Any feedbacks/reviews will be appreciated.

Hi, thanks for all your efforts. It's shaping up nicely so I think the next
version will be ready to be added to -next after the 5.20 merge window.
As I've finished the individual reviews, I'm looking at the result and see a
bit more potential for cleanups, which could be perhaps incorporated to
existing patches, or additionally:

- kmalloc_large_node_notrace() has only one caller, can be removed and the
caller can call __kmalloc_large_node_notrace() directly, especially if it's
not __always_inline as I've IIRC suggested.

- kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
they are in fact for kmalloc despite the name. They depend on
CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
goes through a trace_* function. The actual difference seems that
slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
that way, which is dubious. It might be worth trying to unify this as well?
E.g.
  - use only the CONFIG_TRACING variant, discard the other
  - declare it in mm/slab.h, this is not for general usage
  - single implementation in mm/slab_common.c that calls
    __kmem_cache_alloc_node() from SLAB/SLUB and does the trace

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-07-28 15:23   ` Vlastimil Babka
@ 2022-08-01 13:26     ` Hyeonggon Yoo
  2022-08-01 13:36       ` Vlastimil Babka
  0 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-01 13:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > There is no caller of kmalloc_order_trace() except kmalloc_large().
> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
> > 
> > Also add tracepoint in kmalloc_large() that was previously
> > in kmalloc_order_trace().
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thank you for review.

> Hmm noticed a small change in how we call trace_kmalloc() which will now
> include the __GFP_COMP.

Good catch.

> I think we could just call alloc_pages() from
> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
> __GFP_COMP; first.

I'll consider that in next version.

> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
> will filter it out anyway.

AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
and kmemleak_alloc() will affect their behavior.

- kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
- kmemleak_alloc() only use it to allocate kmemleak_object from
  slab/pool, which does not require __GFP_COMP.

simply calling
	alloc_pages() with flags | GFP_COMP
	kasan_kmalloc_large() with flags
	kmemleak_alloc() with flags
	trace_kmalloc() with flags
will be okay.

BTW this is fixed after patch 9.

> > ---
> >  include/linux/slab.h | 22 ++--------------------
> >  mm/slab_common.c     | 14 +++-----------
> >  2 files changed, 5 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index a0e57df3d5a4..15a4c59da59e 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
> >  }
> >  #endif /* CONFIG_TRACING */
> >  
> > -extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
> > -									 __alloc_size(1);
> > -
> > -#ifdef CONFIG_TRACING
> > -extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> > -				__assume_page_alignment __alloc_size(1);
> > -#else
> > -static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
> > -								 unsigned int order)
> > -{
> > -	return kmalloc_order(size, flags, order);
> > -}
> > -#endif
> > -
> > -static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
> > -{
> > -	unsigned int order = get_order(size);
> > -	return kmalloc_order_trace(size, flags, order);
> > -}
> > -
> > +void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
> > +					      __alloc_size(1);
> >  /**
> >   * kmalloc - allocate memory
> >   * @size: how many bytes of memory are required.
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 6c9aac5d8f4a..1f8af2106df0 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -932,10 +932,11 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
> >   * directly to the page allocator. We use __GFP_COMP, because we will need to
> >   * know the allocation order to free the pages properly in kfree.
> >   */
> > -void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> > +void *kmalloc_large(size_t size, gfp_t flags)
> >  {
> >  	void *ret = NULL;
> >  	struct page *page;
> > +	unsigned int order = get_order(size);
> >  
> >  	if (unlikely(flags & GFP_SLAB_BUG_MASK))
> >  		flags = kmalloc_fix_flags(flags);
> > @@ -950,19 +951,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> >  	ret = kasan_kmalloc_large(ret, size, flags);
> >  	/* As ret might get tagged, call kmemleak hook after KASAN. */
> >  	kmemleak_alloc(ret, size, 1, flags);
> > -	return ret;
> > -}
> > -EXPORT_SYMBOL(kmalloc_order);
> > -
> > -#ifdef CONFIG_TRACING
> > -void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> > -{
> > -	void *ret = kmalloc_order(size, flags, order);
> >  	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
> >  	return ret;
> >  }
> > -EXPORT_SYMBOL(kmalloc_order_trace);
> > -#endif
> > +EXPORT_SYMBOL(kmalloc_large);
> >  
> >  #ifdef CONFIG_SLAB_FREELIST_RANDOM
> >  /* Randomize a generic freelist */
> 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-08-01 13:26     ` Hyeonggon Yoo
@ 2022-08-01 13:36       ` Vlastimil Babka
  2022-08-02  2:54         ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-08-01 13:36 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On 8/1/22 15:26, Hyeonggon Yoo wrote:
> On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
>> On 7/12/22 15:39, Hyeonggon Yoo wrote:
>> > There is no caller of kmalloc_order_trace() except kmalloc_large().
>> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
>> > 
>> > Also add tracepoint in kmalloc_large() that was previously
>> > in kmalloc_order_trace().
>> > 
>> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> 
> 
> Thank you for review.
> 
>> Hmm noticed a small change in how we call trace_kmalloc() which will now
>> include the __GFP_COMP.
> 
> Good catch.
> 
>> I think we could just call alloc_pages() from
>> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
>> __GFP_COMP; first.
> 
> I'll consider that in next version.
> 
>> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
>> will filter it out anyway.
> 
> AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
> and kmemleak_alloc() will affect their behavior.

You mean "will NOT affect...", right?

> - kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
> - kmemleak_alloc() only use it to allocate kmemleak_object from
>   slab/pool, which does not require __GFP_COMP.
> 
> simply calling
> 	alloc_pages() with flags | GFP_COMP
> 	kasan_kmalloc_large() with flags
> 	kmemleak_alloc() with flags
> 	trace_kmalloc() with flags
> will be okay.
> 
> BTW this is fixed after patch 9.

Ah, right. __kmalloc_large_node_notrace() does the same  flags |=
__GFP_COMP; but the tracepoint is outside and unaffected.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-07-28 16:09   ` Vlastimil Babka
@ 2022-08-01 14:37     ` Hyeonggon Yoo
  2022-08-01 14:44       ` Vlastimil Babka
  0 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-01 14:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Thu, Jul 28, 2022 at 06:09:27PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > Now that kmalloc_large_node() is in common code, pass large requests
> > to page allocator in kmalloc_node() using kmalloc_large_node().
> > 
> > One problem is that currently there is no tracepoint in
> > kmalloc_large_node(). Instead of simply putting tracepoint in it,
> > use kmalloc_large_node{,_notrace} depending on its caller to show
> > useful address for both inlined kmalloc_node() and
> > __kmalloc_node_track_caller() when large objects are allocated.
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>

Thanks!

> Nit below:
> 
> > ---
> >  v3:
> > 	This patch is new in v3 and this avoids
> > 	missing caller in __kmalloc_large_node_track_caller()
> > 	when kmalloc_large_node() is called.
> > 
> >  include/linux/slab.h | 26 +++++++++++++++++++-------
> >  mm/slab.h            |  2 ++
> >  mm/slab_common.c     | 17 ++++++++++++++++-
> >  mm/slub.c            |  2 +-
> >  4 files changed, 38 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 082499306098..fd2e129fc813 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
> >  	return __kmalloc(size, flags);
> >  }
> >  
> > +#ifndef CONFIG_SLOB
> >  static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
> >  {
> > -#ifndef CONFIG_SLOB
> > -	if (__builtin_constant_p(size) &&
> > -		size <= KMALLOC_MAX_CACHE_SIZE) {
> > -		unsigned int i = kmalloc_index(size);
> > +	if (__builtin_constant_p(size)) {
> > +		unsigned int index;
> >  
> > -		if (!i)
> > +		if (size > KMALLOC_MAX_CACHE_SIZE)
> > +			return kmalloc_large_node(size, flags, node);
> > +
> > +		index = kmalloc_index(size);
> > +
> > +		if (!index)
> >  			return ZERO_SIZE_PTR;
> >  
> >  		return kmem_cache_alloc_node_trace(
> > -				kmalloc_caches[kmalloc_type(flags)][i],
> > +				kmalloc_caches[kmalloc_type(flags)][index],
> >  						flags, node, size);
> >  	}
> > -#endif
> >  	return __kmalloc_node(size, flags, node);
> >  }
> > +#else
> > +static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
> > +{
> > +	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
> > +		return kmalloc_large_node(size, flags, node);
> > +
> > +	return __kmalloc_node(size, flags, node);
> > +}
> > +#endif
> >  
> >  /**
> >   * kmalloc_array - allocate memory for an array.
> > diff --git a/mm/slab.h b/mm/slab.h
> > index a8d5eb1c323f..7cb51ff44f0c 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
> >  
> >  /* Find the kmalloc slab corresponding for a certain size */
> >  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
> > +
> > +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
> >  #endif
> >  
> >  gfp_t kmalloc_fix_flags(gfp_t flags);
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 6f855587b635..dc872e0ef0fc 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -956,7 +956,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
> >  }
> >  EXPORT_SYMBOL(kmalloc_large);
> >  
> > -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> > +static __always_inline
> 
> I don't think we need to inline, compiler should be able to make
> kmalloc_large_node(_notrace) quite efficient anyway.

You mean s/static __always_inline/static/g, or like this?

kmalloc_large_node_notrace():
	fold __kmalloc_large_node_notrace() into here

kmalloc_large_node():
	kmalloc_large_node_notrace()
	trace_kmalloc()

I have no strong opinion.

IMO It's unlikely that there would be workloads that are
meaningfully affected by inlining or not inlining __kmalloc_large_node_notrace().
Just wanted to avoid adding even tiny of overhead by this series.

> 
> > +void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> >  {
> >  	struct page *page;
> >  	void *ptr = NULL;
> > @@ -976,6 +977,20 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> >  
> >  	return ptr;
> >  }
> > +
> > +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> > +{
> > +	return __kmalloc_large_node_notrace(size, flags, node);
> > +}
> > +
> > +void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> > +{
> > +	void *ret = __kmalloc_large_node_notrace(size, flags, node);
> > +
> > +	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
> > +			   PAGE_SIZE << get_order(size), flags, node);
> > +	return ret;
> > +}
> >  EXPORT_SYMBOL(kmalloc_large_node);
> >  
> >  #ifdef CONFIG_SLAB_FREELIST_RANDOM
> > diff --git a/mm/slub.c b/mm/slub.c
> > index f22a84dd27de..3d02cf44adf7 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4401,7 +4401,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
> >  	void *ret;
> >  
> >  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> > -		ret = kmalloc_large_node(size, flags, node);
> > +		ret = kmalloc_large_node_notrace(size, flags, node);
> >  
> >  		trace_kmalloc_node(caller, ret, NULL,
> >  				   size, PAGE_SIZE << get_order(size),
> 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-08-01 14:37     ` Hyeonggon Yoo
@ 2022-08-01 14:44       ` Vlastimil Babka
  2022-08-02  8:59         ` Hyeonggon Yoo
  0 siblings, 1 reply; 49+ messages in thread
From: Vlastimil Babka @ 2022-08-01 14:44 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm


On 8/1/22 16:37, Hyeonggon Yoo wrote:
> On Thu, Jul 28, 2022 at 06:09:27PM +0200, Vlastimil Babka wrote:
>> On 7/12/22 15:39, Hyeonggon Yoo wrote:
>> > Now that kmalloc_large_node() is in common code, pass large requests
>> > to page allocator in kmalloc_node() using kmalloc_large_node().
>> > 
>> > One problem is that currently there is no tracepoint in
>> > kmalloc_large_node(). Instead of simply putting tracepoint in it,
>> > use kmalloc_large_node{,_notrace} depending on its caller to show
>> > useful address for both inlined kmalloc_node() and
>> > __kmalloc_node_track_caller() when large objects are allocated.
>> > 
>> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> 
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>
> 
> Thanks!
> 
>> Nit below:
>> 
>> > ---
>> >  v3:
>> > 	This patch is new in v3 and this avoids
>> > 	missing caller in __kmalloc_large_node_track_caller()
>> > 	when kmalloc_large_node() is called.
>> > 
>> >  include/linux/slab.h | 26 +++++++++++++++++++-------
>> >  mm/slab.h            |  2 ++
>> >  mm/slab_common.c     | 17 ++++++++++++++++-
>> >  mm/slub.c            |  2 +-
>> >  4 files changed, 38 insertions(+), 9 deletions(-)
>> > 
>> > diff --git a/include/linux/slab.h b/include/linux/slab.h
>> > index 082499306098..fd2e129fc813 100644
>> > --- a/include/linux/slab.h
>> > +++ b/include/linux/slab.h
>> > @@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>> >  	return __kmalloc(size, flags);
>> >  }
>> >  
>> > +#ifndef CONFIG_SLOB
>> >  static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
>> >  {
>> > -#ifndef CONFIG_SLOB
>> > -	if (__builtin_constant_p(size) &&
>> > -		size <= KMALLOC_MAX_CACHE_SIZE) {
>> > -		unsigned int i = kmalloc_index(size);
>> > +	if (__builtin_constant_p(size)) {
>> > +		unsigned int index;
>> >  
>> > -		if (!i)
>> > +		if (size > KMALLOC_MAX_CACHE_SIZE)
>> > +			return kmalloc_large_node(size, flags, node);
>> > +
>> > +		index = kmalloc_index(size);
>> > +
>> > +		if (!index)
>> >  			return ZERO_SIZE_PTR;
>> >  
>> >  		return kmem_cache_alloc_node_trace(
>> > -				kmalloc_caches[kmalloc_type(flags)][i],
>> > +				kmalloc_caches[kmalloc_type(flags)][index],
>> >  						flags, node, size);
>> >  	}
>> > -#endif
>> >  	return __kmalloc_node(size, flags, node);
>> >  }
>> > +#else
>> > +static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
>> > +{
>> > +	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
>> > +		return kmalloc_large_node(size, flags, node);
>> > +
>> > +	return __kmalloc_node(size, flags, node);
>> > +}
>> > +#endif
>> >  
>> >  /**
>> >   * kmalloc_array - allocate memory for an array.
>> > diff --git a/mm/slab.h b/mm/slab.h
>> > index a8d5eb1c323f..7cb51ff44f0c 100644
>> > --- a/mm/slab.h
>> > +++ b/mm/slab.h
>> > @@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
>> >  
>> >  /* Find the kmalloc slab corresponding for a certain size */
>> >  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
>> > +
>> > +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
>> >  #endif
>> >  
>> >  gfp_t kmalloc_fix_flags(gfp_t flags);
>> > diff --git a/mm/slab_common.c b/mm/slab_common.c
>> > index 6f855587b635..dc872e0ef0fc 100644
>> > --- a/mm/slab_common.c
>> > +++ b/mm/slab_common.c
>> > @@ -956,7 +956,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
>> >  }
>> >  EXPORT_SYMBOL(kmalloc_large);
>> >  
>> > -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>> > +static __always_inline
>> 
>> I don't think we need to inline, compiler should be able to make
>> kmalloc_large_node(_notrace) quite efficient anyway.
> 
> You mean s/static __always_inline/static/g, or like this?
> 
> kmalloc_large_node_notrace():
> 	fold __kmalloc_large_node_notrace() into here
> 
> kmalloc_large_node():
> 	kmalloc_large_node_notrace()
> 	trace_kmalloc()
> 
> I have no strong opinion.
> 
> IMO It's unlikely that there would be workloads that are
> meaningfully affected by inlining or not inlining __kmalloc_large_node_notrace().
> Just wanted to avoid adding even tiny of overhead by this series.

I tried the following quick change on top of your series and got (with
CONFIG_SLUB):

add/remove: 2/4 grow/shrink: 0/2 up/down: 208/-588 (-380)
Function                                     old     new   delta
__kmalloc_large_node                           -     186    +186
__kmalloc_large_node.cold                      -      22     +22
kmalloc_large_node.cold                       15       -     -15
kmalloc_large.cold                            15       -     -15
kmalloc_large_node_notrace.cold               22       -     -22
kmalloc_large                                252      87    -165
kmalloc_large_node                           271      86    -185
kmalloc_large_node_notrace                   186       -    -186
Total: Before=49398081, After=49397701, chg -0.00%


diff --git a/mm/slab.h b/mm/slab.h
index ad634e02b3cb..7b8963394144 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -274,7 +274,6 @@ void create_kmalloc_caches(slab_flags_t);
 /* Find the kmalloc slab corresponding for a certain size */
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 
-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
 void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
 			      int node, size_t orig_size,
 			      unsigned long caller);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0d6cbe9d7ad0..07a6bf1cff36 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -925,6 +925,7 @@ void free_large_kmalloc(struct folio *folio, void *object)
 	__free_pages(folio_page(folio, 0), order);
 }
 
+static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
 static __always_inline
 void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
@@ -932,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 	void *ret;
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node_notrace(size, flags, node);
+		ret = __kmalloc_large_node(size, flags, node);
 		trace_kmalloc(_RET_IP_, ret,
 			      size, PAGE_SIZE << get_order(size),
 			      flags, node);
@@ -1042,8 +1043,7 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * know the allocation order to free the pages properly in kfree.
  */
 
-static __always_inline
-void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
+static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
 	void *ptr = NULL;
@@ -1069,7 +1069,7 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 
 void *kmalloc_large(size_t size, gfp_t flags)
 {
-	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
+	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
 	trace_kmalloc(_RET_IP_, ret,
 		      size, PAGE_SIZE << get_order(size),
@@ -1078,14 +1078,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(kmalloc_large);
 
-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
-{
-	return __kmalloc_large_node_notrace(size, flags, node);
-}
-
 void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
-	void *ret = __kmalloc_large_node_notrace(size, flags, node);
+	void *ret = __kmalloc_large_node(size, flags, node);
 
 	trace_kmalloc(_RET_IP_, ret,
 		      size, PAGE_SIZE << get_order(size),

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-08-01 13:36       ` Vlastimil Babka
@ 2022-08-02  2:54         ` Hyeonggon Yoo
  0 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-02  2:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Mon, Aug 01, 2022 at 03:36:48PM +0200, Vlastimil Babka wrote:
> On 8/1/22 15:26, Hyeonggon Yoo wrote:
> > On Thu, Jul 28, 2022 at 05:23:29PM +0200, Vlastimil Babka wrote:
> >> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> >> > There is no caller of kmalloc_order_trace() except kmalloc_large().
> >> > Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().
> >> > 
> >> > Also add tracepoint in kmalloc_large() that was previously
> >> > in kmalloc_order_trace().
> >> > 
> >> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> >> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> 
> > 
> > Thank you for review.
> > 
> >> Hmm noticed a small change in how we call trace_kmalloc() which will now
> >> include the __GFP_COMP.
> > 
> > Good catch.
> > 
> >> I think we could just call alloc_pages() from
> >> kmalloc_large() with flags | __GFP_COMP instead of doing flags |=
> >> __GFP_COMP; first.
> > 
> > I'll consider that in next version.
> > 
> >> AFAICS both kasan_kmalloc_large() and kmemleak_alloc()
> >> will filter it out anyway.
> > 
> > AFAIK not passing __GFP_COMP to both kasan_kmalloc_large()
> > and kmemleak_alloc() will affect their behavior.
> 
> You mean "will NOT affect...", right?

Ah, yes. mistake.

> 
> > - kasan_kmalloc_large() only checks if flag has __GFP_DIRECT_RECLAIM,
> > - kmemleak_alloc() only use it to allocate kmemleak_object from
> >   slab/pool, which does not require __GFP_COMP.
> > 
> > simply calling
> > 	alloc_pages() with flags | GFP_COMP
> > 	kasan_kmalloc_large() with flags
> > 	kmemleak_alloc() with flags
> > 	trace_kmalloc() with flags
> > will be okay.
> > 
> > BTW this is fixed after patch 9.
> 
> Ah, right. __kmalloc_large_node_notrace() does the same  flags |=
> __GFP_COMP; but the tracepoint is outside and unaffected.

I'll reflect your suggestion anyway.

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-08-01 14:44       ` Vlastimil Babka
@ 2022-08-02  8:59         ` Hyeonggon Yoo
  2022-08-02  9:32           ` Vlastimil Babka
  0 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-02  8:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Mon, Aug 01, 2022 at 04:44:22PM +0200, Vlastimil Babka wrote:
> 
> On 8/1/22 16:37, Hyeonggon Yoo wrote:
> > On Thu, Jul 28, 2022 at 06:09:27PM +0200, Vlastimil Babka wrote:
> >> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> >> > Now that kmalloc_large_node() is in common code, pass large requests
> >> > to page allocator in kmalloc_node() using kmalloc_large_node().
> >> > 
> >> > One problem is that currently there is no tracepoint in
> >> > kmalloc_large_node(). Instead of simply putting tracepoint in it,
> >> > use kmalloc_large_node{,_notrace} depending on its caller to show
> >> > useful address for both inlined kmalloc_node() and
> >> > __kmalloc_node_track_caller() when large objects are allocated.
> >> > 
> >> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> >> 
> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >>
> > 
> > Thanks!
> > 
> >> Nit below:
> >> 
> >> > ---
> >> >  v3:
> >> > 	This patch is new in v3 and this avoids
> >> > 	missing caller in __kmalloc_large_node_track_caller()
> >> > 	when kmalloc_large_node() is called.
> >> > 
> >> >  include/linux/slab.h | 26 +++++++++++++++++++-------
> >> >  mm/slab.h            |  2 ++
> >> >  mm/slab_common.c     | 17 ++++++++++++++++-
> >> >  mm/slub.c            |  2 +-
> >> >  4 files changed, 38 insertions(+), 9 deletions(-)
> >> > 
> >> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> >> > index 082499306098..fd2e129fc813 100644
> >> > --- a/include/linux/slab.h
> >> > +++ b/include/linux/slab.h
> >> > @@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
> >> >  	return __kmalloc(size, flags);
> >> >  }
> >> >  
> >> > +#ifndef CONFIG_SLOB
> >> >  static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
> >> >  {
> >> > -#ifndef CONFIG_SLOB
> >> > -	if (__builtin_constant_p(size) &&
> >> > -		size <= KMALLOC_MAX_CACHE_SIZE) {
> >> > -		unsigned int i = kmalloc_index(size);
> >> > +	if (__builtin_constant_p(size)) {
> >> > +		unsigned int index;
> >> >  
> >> > -		if (!i)
> >> > +		if (size > KMALLOC_MAX_CACHE_SIZE)
> >> > +			return kmalloc_large_node(size, flags, node);
> >> > +
> >> > +		index = kmalloc_index(size);
> >> > +
> >> > +		if (!index)
> >> >  			return ZERO_SIZE_PTR;
> >> >  
> >> >  		return kmem_cache_alloc_node_trace(
> >> > -				kmalloc_caches[kmalloc_type(flags)][i],
> >> > +				kmalloc_caches[kmalloc_type(flags)][index],
> >> >  						flags, node, size);
> >> >  	}
> >> > -#endif
> >> >  	return __kmalloc_node(size, flags, node);
> >> >  }
> >> > +#else
> >> > +static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
> >> > +{
> >> > +	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
> >> > +		return kmalloc_large_node(size, flags, node);
> >> > +
> >> > +	return __kmalloc_node(size, flags, node);
> >> > +}
> >> > +#endif
> >> >  
> >> >  /**
> >> >   * kmalloc_array - allocate memory for an array.
> >> > diff --git a/mm/slab.h b/mm/slab.h
> >> > index a8d5eb1c323f..7cb51ff44f0c 100644
> >> > --- a/mm/slab.h
> >> > +++ b/mm/slab.h
> >> > @@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
> >> >  
> >> >  /* Find the kmalloc slab corresponding for a certain size */
> >> >  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
> >> > +
> >> > +void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
> >> >  #endif
> >> >  
> >> >  gfp_t kmalloc_fix_flags(gfp_t flags);
> >> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> > index 6f855587b635..dc872e0ef0fc 100644
> >> > --- a/mm/slab_common.c
> >> > +++ b/mm/slab_common.c
> >> > @@ -956,7 +956,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
> >> >  }
> >> >  EXPORT_SYMBOL(kmalloc_large);
> >> >  
> >> > -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> >> > +static __always_inline
> >> 
> >> I don't think we need to inline, compiler should be able to make
> >> kmalloc_large_node(_notrace) quite efficient anyway.
> > 
> > You mean s/static __always_inline/static/g, or like this?
> > 
> > kmalloc_large_node_notrace():
> > 	fold __kmalloc_large_node_notrace() into here
> > 
> > kmalloc_large_node():
> > 	kmalloc_large_node_notrace()
> > 	trace_kmalloc()
> > 
> > I have no strong opinion.
> > 
> > IMO It's unlikely that there would be workloads that are
> > meaningfully affected by inlining or not inlining __kmalloc_large_node_notrace().
> > Just wanted to avoid adding even tiny of overhead by this series.
> 
> I tried the following quick change on top of your series and got (with
> CONFIG_SLUB):
> 
> add/remove: 2/4 grow/shrink: 0/2 up/down: 208/-588 (-380)
> Function                                     old     new   delta
> __kmalloc_large_node                           -     186    +186
> __kmalloc_large_node.cold                      -      22     +22
> kmalloc_large_node.cold                       15       -     -15
> kmalloc_large.cold                            15       -     -15
> kmalloc_large_node_notrace.cold               22       -     -22
> kmalloc_large                                252      87    -165
> kmalloc_large_node                           271      86    -185
> kmalloc_large_node_notrace                   186       -    -186
> Total: Before=49398081, After=49397701, chg -0.00%
> 

Yeah, uninlining __kmalloc_large_node saves hundreds of bytes.
And the diff below looks good to me.

By The Way, do you have opinions on inlining slab_alloc_node()?
(Looks like similar topic?)

AFAIK slab_alloc_node() is inlined in:
        kmem_cache_alloc()
        kmem_cache_alloc_node()
        kmem_cache_alloc_lru()
        kmem_cache_alloc_trace()
        kmem_cache_alloc_node_trace()
        __kmem_cache_alloc_node()

This is what I get after simply dropping __always_inline in slab_alloc_node:

add/remove: 1/1 grow/shrink: 3/6 up/down: 1911/-5275 (-3364)
Function                                     old     new   delta
slab_alloc_node                                -    1356   +1356
sysfs_slab_alias                             134     327    +193
slab_memory_callback                         528     717    +189
__kmem_cache_create                         1325    1498    +173
__slab_alloc.constprop                       135       -    -135
kmem_cache_alloc_trace                       909     196    -713
kmem_cache_alloc                             937     191    -746
kmem_cache_alloc_node_trace                 1020     200    -820
__kmem_cache_alloc_node                      862      19    -843
kmem_cache_alloc_node                       1046     189    -857
kmem_cache_alloc_lru                        1348     187   -1161
Total: Before=32011183, After=32007819, chg -0.01%

So 3.28kB is cost of eliminating function call overhead in the 
fastpath.

This is tradeoff between function call overhead and
instruction cache usage...

> 
> diff --git a/mm/slab.h b/mm/slab.h
> index ad634e02b3cb..7b8963394144 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -274,7 +274,6 @@ void create_kmalloc_caches(slab_flags_t);
>  /* Find the kmalloc slab corresponding for a certain size */
>  struct kmem_cache *kmalloc_slab(size_t, gfp_t);
>  
> -void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
>  void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
>  			      int node, size_t orig_size,
>  			      unsigned long caller);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 0d6cbe9d7ad0..07a6bf1cff36 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -925,6 +925,7 @@ void free_large_kmalloc(struct folio *folio, void *object)
>  	__free_pages(folio_page(folio, 0), order);
>  }
>  
> +static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
>  static __always_inline
>  void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>  {
> @@ -932,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
>  	void *ret;
>  
>  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> -		ret = kmalloc_large_node_notrace(size, flags, node);
> +		ret = __kmalloc_large_node(size, flags, node);
>  		trace_kmalloc(_RET_IP_, ret,
>  			      size, PAGE_SIZE << get_order(size),
>  			      flags, node);
> @@ -1042,8 +1043,7 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
>   * know the allocation order to free the pages properly in kfree.
>   */
>  
> -static __always_inline
> -void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> +static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
>  {
>  	struct page *page;
>  	void *ptr = NULL;
> @@ -1069,7 +1069,7 @@ void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
>  
>  void *kmalloc_large(size_t size, gfp_t flags)
>  {
> -	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
> +	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
>  
>  	trace_kmalloc(_RET_IP_, ret,
>  		      size, PAGE_SIZE << get_order(size),
> @@ -1078,14 +1078,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
>  }
>  EXPORT_SYMBOL(kmalloc_large);
>  
> -void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
> -{
> -	return __kmalloc_large_node_notrace(size, flags, node);
> -}
> -
>  void *kmalloc_large_node(size_t size, gfp_t flags, int node)
>  {
> -	void *ret = __kmalloc_large_node_notrace(size, flags, node);
> +	void *ret = __kmalloc_large_node(size, flags, node);
>  
>  	trace_kmalloc(_RET_IP_, ret,
>  		      size, PAGE_SIZE << get_order(size),

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
  2022-07-29 11:23   ` Vlastimil Babka
@ 2022-08-02  9:22     ` Hyeonggon Yoo
  0 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-02  9:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Fri, Jul 29, 2022 at 01:23:38PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
> > using TRACE_EVENT() macro.
> > 
> > And then this patch does:
> >    - Do not pass pointer to struct kmem_cache to trace_kmalloc.
> >      gfp flag is enough to know if it's accounted or not.
> >    - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
> >    - Avoid dereferencing s->name in when not using kmem_cache_free event.
> > 
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> With some comments:
> 
> > ---
> >  include/trace/events/kmem.h | 64 +++++++++++++++++++++++++------------
> >  mm/slab.c                   | 16 ++++------
> >  mm/slab_common.c            | 17 ++++++----
> >  mm/slob.c                   | 16 +++-------
> >  mm/slub.c                   | 13 +++-----
> >  5 files changed, 70 insertions(+), 56 deletions(-)
> > 
> > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> > index e078ebcdc4b1..0ded2c351062 100644
> > --- a/include/trace/events/kmem.h
> > +++ b/include/trace/events/kmem.h
> > @@ -9,17 +9,15 @@
> >  #include <linux/tracepoint.h>
> >  #include <trace/events/mmflags.h>
> >  
> > -DECLARE_EVENT_CLASS(kmem_alloc,
> > +TRACE_EVENT(kmem_cache_alloc,
> >  
> >  	TP_PROTO(unsigned long call_site,
> >  		 const void *ptr,
> >  		 struct kmem_cache *s,
> > -		 size_t bytes_req,
> > -		 size_t bytes_alloc,
> >  		 gfp_t gfp_flags,
> >  		 int node),
> >  
> > -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
> > +	TP_ARGS(call_site, ptr, s, gfp_flags, node),
> >  
> >  	TP_STRUCT__entry(
> >  		__field(	unsigned long,	call_site	)
> > @@ -34,8 +32,8 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> >  	TP_fast_assign(
> >  		__entry->call_site	= call_site;
> >  		__entry->ptr		= ptr;
> > -		__entry->bytes_req	= bytes_req;
> > -		__entry->bytes_alloc	= bytes_alloc;
> > +		__entry->bytes_req	= s->object_size;
> > +		__entry->bytes_alloc	= s->size;
> >  		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
> >  		__entry->node		= node;
> >  		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
> 
> This assignment continues by ...
> 	(s && s->flags & SLAB_ACCOUNT)) : false;
> 
> We could now remove the "s &&" as we assume it's non-NULL anyway.

Right. will adjust. 

> 
> > @@ -53,22 +51,46 @@ DECLARE_EVENT_CLASS(kmem_alloc,
> >  		__entry->accounted ? "true" : "false")
> >  );
> >  
> > -DEFINE_EVENT(kmem_alloc, kmalloc,
> > +TRACE_EVENT(kmalloc,
> >  
> > -	TP_PROTO(unsigned long call_site, const void *ptr,
> > -		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> > -		 gfp_t gfp_flags, int node),
> > +	TP_PROTO(unsigned long call_site,
> > +		 const void *ptr,
> > +		 size_t bytes_req,
> > +		 size_t bytes_alloc,
> > +		 gfp_t gfp_flags,
> > +		 int node),
> >  
> > -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> > -);
> > +	TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
> >  
> > -DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
> > +	TP_STRUCT__entry(
> > +		__field(	unsigned long,	call_site	)
> > +		__field(	const void *,	ptr		)
> > +		__field(	size_t,		bytes_req	)
> > +		__field(	size_t,		bytes_alloc	)
> > +		__field(	unsigned long,	gfp_flags	)
> > +		__field(	int,		node		)
> > +		__field(	bool,		accounted	)
> > +	),
> >  
> > -	TP_PROTO(unsigned long call_site, const void *ptr,
> > -		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
> > -		 gfp_t gfp_flags, int node),
> > +	TP_fast_assign(
> > +		__entry->call_site	= call_site;
> > +		__entry->ptr		= ptr;
> > +		__entry->bytes_req	= bytes_req;
> > +		__entry->bytes_alloc	= bytes_alloc;
> > +		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
> > +		__entry->node		= node;
> > +		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
> > +					  (gfp_flags & __GFP_ACCOUNT) : false;
> 
> This test could be perhaps moved to TP_printk and the accounted field
> removed completely, but we need to make sure trace-cmd (which reads the
> binary entries and does the printk in userspace) can interpret the tests
> properly, including the IS_ENABLED(CONFIG_MEMCG_KMEM) part.

Okay, I'll try moving it and check if trace-cmd could understand it.

> 
> > +	),
> >  
> > -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
> > +	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
> > +		(void *)__entry->call_site,
> > +		__entry->ptr,
> > +		__entry->bytes_req,
> > +		__entry->bytes_alloc,
> > +		show_gfp_flags(__entry->gfp_flags),
> > +		__entry->node,
> > +		__entry->accounted ? "true" : "false")
> >  );
> >  
> >  TRACE_EVENT(kfree,
> > @@ -93,20 +115,20 @@ TRACE_EVENT(kfree,
> >  
> >  TRACE_EVENT(kmem_cache_free,
> >  
> > -	TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
> > +	TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
> >  
> > -	TP_ARGS(call_site, ptr, name),
> > +	TP_ARGS(call_site, ptr, s),
> >  
> >  	TP_STRUCT__entry(
> >  		__field(	unsigned long,	call_site	)
> >  		__field(	const void *,	ptr		)
> > -		__string(	name,	name	)
> > +		__string(	name,		s->name		)
> >  	),
> >  
> >  	TP_fast_assign(
> >  		__entry->call_site	= call_site;
> >  		__entry->ptr		= ptr;
> > -		__assign_str(name, name);
> > +		__assign_str(name, s->name);
> >  	),
> >  
> >  	TP_printk("call_site=%pS ptr=%p name=%s",
> > diff --git a/mm/slab.c b/mm/slab.c
> > index f9b74831e3f4..1685e5507a59 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3447,8 +3447,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
> >  {
> >  	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
> >  
> > -	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
> > -			       cachep->size, flags, NUMA_NO_NODE);
> > +	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
> >  
> >  	return ret;
> >  }
> > @@ -3537,8 +3536,9 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
> >  	ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
> >  
> >  	ret = kasan_kmalloc(cachep, ret, size, flags);
> > -	trace_kmalloc(_RET_IP_, ret, cachep,
> > -		      size, cachep->size, flags, NUMA_NO_NODE);
> > +	trace_kmalloc(_RET_IP_, ret,
> > +		      size, cachep->size,
> > +		      flags, NUMA_NO_NODE);
> 
> This could now fit on 2, maybe even 1 line?
> Other trace_* calls could be perhaps also reduced, please check.
>

Okay, will check.

> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(kmem_cache_alloc_trace);
> > @@ -3561,9 +3561,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> >  {
> >  	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
> >  
> > -	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
> > -			       cachep->object_size, cachep->size,
> > -			       flags, nodeid);
> > +	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
> >  
> >  	return ret;
> >  }
> > @@ -3588,7 +3586,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> >  	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
> >  
> >  	ret = kasan_kmalloc(cachep, ret, size, flags);
> > -	trace_kmalloc(_RET_IP_, ret, cachep,
> > +	trace_kmalloc(_RET_IP_, ret,
> >  		      size, cachep->size,
> >  		      flags, nodeid);
> >  	return ret;
> > @@ -3652,7 +3650,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> >  	if (!cachep)
> >  		return;
> >  
> > -	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
> > +	trace_kmem_cache_free(_RET_IP_, objp, cachep);
> >  	__do_kmem_cache_free(cachep, objp, _RET_IP_);
> >  }
> >  EXPORT_SYMBOL(kmem_cache_free);
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 0e66b4911ebf..c01c6b8f0d34 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -933,7 +933,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
> >  
> >  	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
> >  		ret = kmalloc_large_node_notrace(size, flags, node);
> > -		trace_kmalloc(_RET_IP_, ret, NULL,
> > +		trace_kmalloc(_RET_IP_, ret,
> >  			      size, PAGE_SIZE << get_order(size),
> >  			      flags, node);
> >  		return ret;
> > @@ -946,8 +946,9 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
> >  
> >  	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
> >  	ret = kasan_kmalloc(s, ret, size, flags);
> > -	trace_kmalloc(_RET_IP_, ret, s, size,
> > -		      s->size, flags, node);
> > +	trace_kmalloc(_RET_IP_, ret,
> > +		      size, s->size,
> > +		      flags, node);
> >  	return ret;
> >  }
> >  
> > @@ -1075,8 +1076,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
> >  {
> >  	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
> >  
> > -	trace_kmalloc(_RET_IP_, ret, NULL, size,
> > -		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
> > +	trace_kmalloc(_RET_IP_, ret,
> > +		      size, PAGE_SIZE << get_order(size),
> > +		      flags, NUMA_NO_NODE);
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(kmalloc_large);
> > @@ -1090,8 +1092,9 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> >  {
> >  	void *ret = __kmalloc_large_node_notrace(size, flags, node);
> >  
> > -	trace_kmalloc(_RET_IP_, ret, NULL, size,
> > -		      PAGE_SIZE << get_order(size), flags, node);
> > +	trace_kmalloc(_RET_IP_, ret,
> > +		      size, PAGE_SIZE << get_order(size),
> > +		      flags, node);
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL(kmalloc_large_node);
> > diff --git a/mm/slob.c b/mm/slob.c
> > index a4d50d957c25..97a4d2407f96 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> >  		*m = size;
> >  		ret = (void *)m + minalign;
> >  
> > -		trace_kmalloc(caller, ret, NULL,
> > -			      size, size + minalign, gfp, node);
> > +		trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
> >  	} else {
> >  		unsigned int order = get_order(size);
> >  
> > @@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
> >  			gfp |= __GFP_COMP;
> >  		ret = slob_new_pages(gfp, order, node);
> >  
> > -		trace_kmalloc(caller, ret, NULL,
> > -			      size, PAGE_SIZE << order, gfp, node);
> > +		trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
> >  	}
> >  
> >  	kmemleak_alloc(ret, size, 1, gfp);
> > @@ -608,14 +606,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
> >  
> >  	if (c->size < PAGE_SIZE) {
> >  		b = slob_alloc(c->size, flags, c->align, node, 0);
> > -		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> > -				       SLOB_UNITS(c->size) * SLOB_UNIT,
> 
> Hm we lose the SLOB_UNITS() adjustment here.

Thank you for catching this.

> Maybe we don't care that much,
> maybe simply c->size could be already stored with SLOB_UNITS() (but didn't
> check if other stuff breaks with that).

Hmm will try that way!

Thanks!

> 
> Thanks!
> 
> > -				       flags, node);
> > +		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> >  	} else {
> >  		b = slob_new_pages(flags, get_order(c->size), node);
> > -		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
> > -				       PAGE_SIZE << get_order(c->size),
> > -				       flags, node);
> > +		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
> >  	}
> >  
> >  	if (b && c->ctor) {
> > @@ -671,7 +665,7 @@ static void kmem_rcu_free(struct rcu_head *head)
> >  void kmem_cache_free(struct kmem_cache *c, void *b)
> >  {
> >  	kmemleak_free_recursive(b, c->flags);
> > -	trace_kmem_cache_free(_RET_IP_, b, c->name);
> > +	trace_kmem_cache_free(_RET_IP_, b, c);
> >  	if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
> >  		struct slob_rcu *slob_rcu;
> >  		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
> > diff --git a/mm/slub.c b/mm/slub.c
> > index f1aa51480dc4..a49c69469c64 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> >  {
> >  	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
> >  
> > -	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
> > -				s->size, gfpflags, NUMA_NO_NODE);
> > +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
> >  
> >  	return ret;
> >  }
> > @@ -3274,7 +3273,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
> >  void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
> >  {
> >  	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
> > -	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
> > +	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
> >  	ret = kasan_kmalloc(s, ret, size, gfpflags);
> >  	return ret;
> >  }
> > @@ -3285,8 +3284,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> >  {
> >  	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
> >  
> > -	trace_kmem_cache_alloc(_RET_IP_, ret, s,
> > -			       s->object_size, s->size, gfpflags, node);
> > +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
> >  
> >  	return ret;
> >  }
> > @@ -3299,8 +3297,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
> >  {
> >  	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
> >  
> > -	trace_kmalloc(_RET_IP_, ret, s,
> > -		      size, s->size, gfpflags, node);
> > +	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
> >  
> >  	ret = kasan_kmalloc(s, ret, size, gfpflags);
> >  	return ret;
> > @@ -3544,7 +3541,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
> >  	s = cache_from_obj(s, x);
> >  	if (!s)
> >  		return;
> > -	trace_kmem_cache_free(_RET_IP_, x, s->name);
> > +	trace_kmem_cache_free(_RET_IP_, x, s);
> >  	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
> >  }
> >  EXPORT_SYMBOL(kmem_cache_free);
> 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h
  2022-07-29 11:47   ` Vlastimil Babka
@ 2022-08-02  9:25     ` Hyeonggon Yoo
  0 siblings, 0 replies; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-02  9:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Fri, Jul 29, 2022 at 01:47:23PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > __ksize() is only called by KASAN. Remove export symbol and move
> > definition to mm/slab.h as we don't want to grow its callers.
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Nit: The subject and changelog should probably say declaration, not definition.

Ah, right. will do.
Thanks!

> 
> > ---
> >  include/linux/slab.h |  1 -
> >  mm/slab.h            |  2 ++
> >  mm/slab_common.c     | 11 +----------
> >  mm/slob.c            |  1 -
> >  4 files changed, 3 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 4ee5b2fed164..701fe538650f 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
> >  void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
> >  void kfree(const void *objp);
> >  void kfree_sensitive(const void *objp);
> > -size_t __ksize(const void *objp);
> >  size_t ksize(const void *objp);
> >  #ifdef CONFIG_PRINTK
> >  bool kmem_valid_obj(void *object);
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 9193e9c1f040..ad634e02b3cb 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -678,6 +678,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
> >  
> >  #endif /* CONFIG_SLOB */
> >  
> > +size_t __ksize(const void *objp);
> > +
> >  static inline size_t slab_ksize(const struct kmem_cache *s)
> >  {
> >  #ifndef CONFIG_SLUB
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index c01c6b8f0d34..1f8db7959366 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1003,15 +1003,7 @@ void kfree(const void *object)
> >  }
> >  EXPORT_SYMBOL(kfree);
> >  
> > -/**
> > - * __ksize -- Uninstrumented ksize.
> > - * @objp: pointer to the object
> > - *
> > - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> > - * safety checks as ksize() with KASAN instrumentation enabled.
> > - *
> > - * Return: size of the actual memory used by @objp in bytes
> > - */
> > +/* Uninstrumented ksize. Only called by KASAN. */
> >  size_t __ksize(const void *object)
> >  {
> >  	struct folio *folio;
> > @@ -1026,7 +1018,6 @@ size_t __ksize(const void *object)
> >  
> >  	return slab_ksize(folio_slab(folio)->slab_cache);
> >  }
> > -EXPORT_SYMBOL(__ksize);
> >  #endif /* !CONFIG_SLOB */
> >  
> >  gfp_t kmalloc_fix_flags(gfp_t flags)
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 97a4d2407f96..91d6e2b19929 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -584,7 +584,6 @@ size_t __ksize(const void *block)
> >  	m = (unsigned int *)(block - align);
> >  	return SLOB_UNITS(*m) * SLOB_UNIT;
> >  }
> > -EXPORT_SYMBOL(__ksize);
> >  
> >  int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
> >  {
> 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-08-02  8:59         ` Hyeonggon Yoo
@ 2022-08-02  9:32           ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-08-02  9:32 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On 8/2/22 10:59, Hyeonggon Yoo wrote:
> On Mon, Aug 01, 2022 at 04:44:22PM +0200, Vlastimil Babka wrote:
>> 
> 
> Yeah, uninlining __kmalloc_large_node saves hundreds of bytes.
> And the diff below looks good to me.
> 
> By The Way, do you have opinions on inlining slab_alloc_node()?
> (Looks like similar topic?)
> 
> AFAIK slab_alloc_node() is inlined in:
>         kmem_cache_alloc()
>         kmem_cache_alloc_node()
>         kmem_cache_alloc_lru()
>         kmem_cache_alloc_trace()
>         kmem_cache_alloc_node_trace()
>         __kmem_cache_alloc_node()
> 
> This is what I get after simply dropping __always_inline in slab_alloc_node:
> 
> add/remove: 1/1 grow/shrink: 3/6 up/down: 1911/-5275 (-3364)
> Function                                     old     new   delta
> slab_alloc_node                                -    1356   +1356
> sysfs_slab_alias                             134     327    +193
> slab_memory_callback                         528     717    +189
> __kmem_cache_create                         1325    1498    +173
> __slab_alloc.constprop                       135       -    -135
> kmem_cache_alloc_trace                       909     196    -713
> kmem_cache_alloc                             937     191    -746
> kmem_cache_alloc_node_trace                 1020     200    -820
> __kmem_cache_alloc_node                      862      19    -843
> kmem_cache_alloc_node                       1046     189    -857
> kmem_cache_alloc_lru                        1348     187   -1161
> Total: Before=32011183, After=32007819, chg -0.01%
> 
> So 3.28kB is cost of eliminating function call overhead in the 
> fastpath.
> 
> This is tradeoff between function call overhead and
> instruction cache usage...

We can investigate this aftewards, with proper measurements etc. I think
it's more sensitive than kmalloc_large_node.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 00/15] common kmalloc v3
  2022-07-29 15:08 ` [PATCH v3 00/15] common kmalloc v3 Vlastimil Babka
@ 2022-08-14 10:06   ` Hyeonggon Yoo
  2022-08-15 12:59     ` Vlastimil Babka
  0 siblings, 1 reply; 49+ messages in thread
From: Hyeonggon Yoo @ 2022-08-14 10:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On Fri, Jul 29, 2022 at 05:08:31PM +0200, Vlastimil Babka wrote:
> On 7/12/22 15:39, Hyeonggon Yoo wrote:
> > This is v3 of common kmalloc series.
> > 
> > This series generalize most of kmalloc functions and move its
> > implementation into mm/slab_common.c.
> > 
> > I believe this series give better maintainability of code for SLAB and SLUB.
> > Please consider applying.
> > 
> > This series is based on slab/for-next and also available at
> > https://github.com/hygoni/linux/tree/slab-common-v3r0
> > 
> > Any feedbacks/reviews will be appreciated.
> 
> Hi, thanks for all your efforts. It's shaping up nicely so I think the next
> version will be ready to be added to -next after the 5.20 merge window.
> As I've finished the individual reviews, I'm looking at the result and see a
> bit more potential for cleanups, which could be perhaps incorporated to
> existing patches, or additionally:

Thank you for reviews and I too would like to add it to -next soon!

> 
> - kmalloc_large_node_notrace() has only one caller, can be removed and the
> caller can call __kmalloc_large_node_notrace() directly, especially if it's
> not __always_inline as I've IIRC suggested.

Will adjust in next version.

> - kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
> they are in fact for kmalloc despite the name.

Yeah, I'm the one that would like to rename it to kmalloc_trace() and
kmalloc_node_trace().

> They depend on
> CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
> goes through a trace_* function. The actual difference seems that
> slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
> that way, which is dubious. It might be worth trying to unify this as well?
> E.g.
>   - use only the CONFIG_TRACING variant, discard the other

Sounds okay.

>   - declare it in mm/slab.h, this is not for general usage

We can't completely remove it because its needed in include/linux/slab.h
for inlined kmalloc.

>   - single implementation in mm/slab_common.c that calls
>     __kmem_cache_alloc_node() from SLAB/SLUB and does the trace

While I love the idea of single implementation in mm/slab_common.c,
making use of __kmem_cache_alloc_node() and __kmem_cache_free() adds
a bit of overhead:
	it adds overhead of function call and can't benefit from inlining
	(including removing unnnecessary part of function code) 

So... what about including slab_common.c in sl?b.c,
so that compiler can treat sl?b.c and slab_common.c as a single translation unit?
(or split kmalloc implementation into kmalloc.c and do similar thing?)

Thanks!

---
Hyeonggon

diff --git a/mm/Makefile b/mm/Makefile
index 9a564f836403..bcee8495b531 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -3,7 +3,6 @@
 # Makefile for the linux memory manager.
 #

-KASAN_SANITIZE_slab_common.o := n
 KASAN_SANITIZE_slab.o := n
 KASAN_SANITIZE_slub.o := n
 KCSAN_SANITIZE_kmemleak.o := n
@@ -11,7 +10,6 @@ KCSAN_SANITIZE_kmemleak.o := n
 # These produce frequent data race reports: most of them are due to races on
 # the same word but accesses to different bits of that word. Re-enable KCSAN
 # for these when we have more consensus on what to do about them.
-KCSAN_SANITIZE_slab_common.o := n
 KCSAN_SANITIZE_slab.o := n
 KCSAN_SANITIZE_slub.o := n
 KCSAN_SANITIZE_page_alloc.o := n
@@ -21,7 +19,6 @@ KCSAN_INSTRUMENT_BARRIERS := y
 # These files are disabled because they produce non-interesting and/or
 # flaky coverage that is not a function of syscall inputs. E.g. slab is out of
 # free pages, or a task is migrated between nodes.
-KCOV_INSTRUMENT_slab_common.o := n
 KCOV_INSTRUMENT_slob.o := n
 KCOV_INSTRUMENT_slab.o := n
 KCOV_INSTRUMENT_slub.o := n
@@ -51,8 +48,7 @@ obj-y                 := filemap.o mempool.o oom_kill.o fadvise.o \
                           maccess.o page-writeback.o folio-compat.o \
                           readahead.o swap.o truncate.o vmscan.o shmem.o \
                           util.o mmzone.o vmstat.o backing-dev.o \
-                          mm_init.o percpu.o slab_common.o \
-                          compaction.o vmacache.o \
+                          mm_init.o percpu.o compaction.o vmacache.o \
                           interval_tree.o list_lru.o workingset.o \
                           debug.o gup.o mmap_lock.o $(mmu-y)

diff --git a/mm/slab.c b/mm/slab.c
index a5486ff8362a..a302a7c17b40 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -131,6 +131,8 @@

 #include       "slab.h"

+#include       "slab_common.c"
+
 /*
  * DEBUG       - 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & SLAB_POISON.
  *               0 for faster, smaller code (especially in the critical paths).
@@ -3541,6 +3543,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);

+static __always_inline
 void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
                             int nodeid, size_t orig_size,
                             unsigned long caller)
@@ -3585,6 +3588,7 @@ void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
        local_irq_restore(flags);
 }

+static __always_inline
 void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
                       unsigned long caller)
 {
diff --git a/mm/slab.h b/mm/slab.h
index 65023f000d42..4e43ab717d99 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -273,11 +273,6 @@ void create_kmalloc_caches(slab_flags_t);

 /* Find the kmalloc slab corresponding for a certain size */
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
-
-void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
-                             int node, size_t orig_size,
-                             unsigned long caller);
-void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
 #endif

 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 83cfe91b6ab6..70d7393d31ae 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -898,6 +898,12 @@ void free_large_kmalloc(struct folio *folio, void *object)
        __free_pages(folio_page(folio, 0), order);
 }

+static __always_inline
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node,
+                             size_t orig_size, unsigned long caller);
+static __always_inline
+void __kmem_cache_free(struct kmem_cache *s, void *objp, unsigned long caller);
+
 static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
 static __always_inline
 void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
diff --git a/mm/slob.c b/mm/slob.c
index 45a061b8ba38..656f4f8b77d9 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -74,6 +74,9 @@
 #include <linux/atomic.h>

 #include "slab.h"
+
+#include "slab_common.c"
+
 /*
  * slob_block has a field 'units', which indicates size of block if +ve,
  * or offset of next block if -ve (in SLOB_UNITs).
diff --git a/mm/slub.c b/mm/slub.c
index 8083a6ee5f15..bd47a30049ae 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -45,6 +45,8 @@

 #include "internal.h"

+#include "slab_common.c"
+
 /*
  * Lock order:
  *   1. slab_mutex (Global Mutex)
@@ -3261,6 +3263,7 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru);

+static __always_inline
 void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
                              int node, size_t orig_size,
                              unsigned long caller)
@@ -3505,6 +3508,7 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
 }
 #endif

+static __always_inline
 void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
 {
        slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 00/15] common kmalloc v3
  2022-08-14 10:06   ` Hyeonggon Yoo
@ 2022-08-15 12:59     ` Vlastimil Babka
  0 siblings, 0 replies; 49+ messages in thread
From: Vlastimil Babka @ 2022-08-15 12:59 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Joe Perches, Vasily Averin,
	Matthew WilCox, linux-kernel, linux-mm

On 8/14/22 12:06, Hyeonggon Yoo wrote:
> On Fri, Jul 29, 2022 at 05:08:31PM +0200, Vlastimil Babka wrote:
>> On 7/12/22 15:39, Hyeonggon Yoo wrote:
>>> This is v3 of common kmalloc series.
>>>
>>> This series generalize most of kmalloc functions and move its
>>> implementation into mm/slab_common.c.
>>>
>>> I believe this series give better maintainability of code for SLAB and SLUB.
>>> Please consider applying.
>>>
>>> This series is based on slab/for-next and also available at
>>> https://github.com/hygoni/linux/tree/slab-common-v3r0
>>>
>>> Any feedbacks/reviews will be appreciated.
>>
>> Hi, thanks for all your efforts. It's shaping up nicely so I think the next
>> version will be ready to be added to -next after the 5.20 merge window.
>> As I've finished the individual reviews, I'm looking at the result and see a
>> bit more potential for cleanups, which could be perhaps incorporated to
>> existing patches, or additionally:
> 
> Thank you for reviews and I too would like to add it to -next soon!
> 
>>
>> - kmalloc_large_node_notrace() has only one caller, can be removed and the
>> caller can call __kmalloc_large_node_notrace() directly, especially if it's
>> not __always_inline as I've IIRC suggested.
> 
> Will adjust in next version.
> 
>> - kmem_cache_alloc_trace() and kmem_cache_alloc_node_trace() are weird ones,
>> they are in fact for kmalloc despite the name.
> 
> Yeah, I'm the one that would like to rename it to kmalloc_trace() and
> kmalloc_node_trace().
> 
>> They depend on
>> CONFIG_TRACING, yet if you look carefully, the !CONFIG_TRACING variant also
>> goes through a trace_* function. The actual difference seems that
>> slab_alloc() thus kasan_alloc() and kfence_alloc() don't get the orig_size
>> that way, which is dubious. It might be worth trying to unify this as well?
>> E.g.
>>   - use only the CONFIG_TRACING variant, discard the other
> 
> Sounds okay.
> 
>>   - declare it in mm/slab.h, this is not for general usage
> 
> We can't completely remove it because its needed in include/linux/slab.h
> for inlined kmalloc.

Ah, ok.

>>   - single implementation in mm/slab_common.c that calls
>>     __kmem_cache_alloc_node() from SLAB/SLUB and does the trace
> 
> While I love the idea of single implementation in mm/slab_common.c,
> making use of __kmem_cache_alloc_node() and __kmem_cache_free() adds
> a bit of overhead:
> 	it adds overhead of function call and can't benefit from inlining
> 	(including removing unnnecessary part of function code) 

Hm, right.

> So... what about including slab_common.c in sl?b.c,
> so that compiler can treat sl?b.c and slab_common.c as a single translation unit?
> (or split kmalloc implementation into kmalloc.c and do similar thing?)

I don't know if that has a good precedent in the kernel. Maybe we can
postpone these more radical attempts to a later series.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-08-15 13:00 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 13:39 [PATCH v3 00/15] common kmalloc v3 Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 1/15] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
2022-07-12 14:29   ` Christoph Lameter
2022-07-13  9:39     ` Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 2/15] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 03/15] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 04/15] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 05/15] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
2022-07-28 14:45   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 06/15] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
2022-07-28 15:23   ` Vlastimil Babka
2022-08-01 13:26     ` Hyeonggon Yoo
2022-08-01 13:36       ` Vlastimil Babka
2022-08-02  2:54         ` Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 07/15] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH v3 08/15] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
2022-07-28 16:09   ` Vlastimil Babka
2022-08-01 14:37     ` Hyeonggon Yoo
2022-08-01 14:44       ` Vlastimil Babka
2022-08-02  8:59         ` Hyeonggon Yoo
2022-08-02  9:32           ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 09/15] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
2022-07-28 16:13   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 10/15] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
2022-07-28 16:25   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 11/15] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
2022-07-29  9:49   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 12/15] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
2022-07-29 10:25   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH v3 13/15] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
2022-07-29 10:52   ` Vlastimil Babka
2022-07-12 13:39 ` [PATCH 14/16] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
2022-07-29 11:23   ` Vlastimil Babka
2022-08-02  9:22     ` Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH 15/16] mm/slab_common: move definition of __ksize() to mm/slab.h Hyeonggon Yoo
2022-07-29 11:47   ` Vlastimil Babka
2022-08-02  9:25     ` Hyeonggon Yoo
2022-07-12 13:39 ` [PATCH 16/16] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
2022-07-12 15:13   ` Christoph Lameter
2022-07-13  9:25     ` Hyeonggon Yoo
2022-07-13 10:07       ` Christoph Lameter
2022-07-13 10:33         ` Marco Elver
2022-07-14  9:15           ` Christoph Lameter
2022-07-14 10:30             ` Marco Elver
2022-07-20 10:05               ` Hyeonggon Yoo
2022-07-29 11:50   ` Vlastimil Babka
2022-07-29 15:08 ` [PATCH v3 00/15] common kmalloc v3 Vlastimil Babka
2022-08-14 10:06   ` Hyeonggon Yoo
2022-08-15 12:59     ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.