All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/17] common kmalloc v4
@ 2022-08-17 10:18 Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 01/17] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
                   ` (17 more replies)
  0 siblings, 18 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

v3: https://lore.kernel.org/lkml/20220712133946.307181-1-42.hyeyoo@gmail.com/

Hello, this is common kmalloc v4.
Please review and consider applying.

Changes from v3 are shown as range-diff below.
(range-diff does not include new patch 13)

v3 -> v4:
	
	- Rebased to commit 3cc40a443a04d5 (after 6.0-rc1)

	- Added Reviewed-by: from Vlastimil Babka. Thanks!

	- Adjusted comments from Vlastimil Babka:

		- uninline __kmalloc_large_node_notrace()
		
		- Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in
		  SLOB.

		- do not pass __GFP_COMP to tracepoint in
		  kmalloc_large() and friends

		- defer testing of 'accounted' into TP_printk until
		  trace_kmalloc()

		- rename kmem_cache_alloc[_node]_trace() to
		  kmalloc[_node]_trace() and move it to slab_alloc.c,
		  
		  Use __assume_kmalloc_alignement instead of
		  __assume_slab_alignment (patch 13)
	
		- replace word definition to declaration in changelog

	- Adjusted comment from Chrisptoh Lameter:
		- replace WARN_ON() to BUG_ON()


Any feedbacks/suggestions will be appreciated!

Thanks!

Hyeonggon Yoo (17):
  mm/slab: move NUMA-related code to __do_cache_alloc()
  mm/slab: cleanup slab_alloc() and slab_alloc_node()
  mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
  mm/slab_common: cleanup kmalloc_track_caller()
  mm/sl[au]b: factor out __do_kmalloc_node()
  mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  mm/slub: move kmalloc_large_node() to slab_common.c
  mm/slab_common: kmalloc_node: pass large requests to page allocator
  mm/slab_common: cleanup kmalloc_large()
  mm/slab: kmalloc: pass requests larger than order-1 page to page
    allocator
  mm/sl[au]b: introduce common alloc/free functions without tracepoint
  mm/sl[au]b: generalize kmalloc subsystem
  mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
  mm/slab_common: unify NUMA and UMA version of tracepoints
  mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not
    using
  mm/slab_common: move declaration of __ksize() to mm/slab.h
  mm/sl[au]b: check if large object is valid in __ksize()

 include/linux/slab.h        | 144 ++++++-----------
 include/trace/events/kmem.h |  74 +++------
 mm/slab.c                   | 305 +++++++++---------------------------
 mm/slab.h                   |  10 ++
 mm/slab_common.c            | 191 +++++++++++++++++++---
 mm/slob.c                   |  31 ++--
 mm/slub.c                   | 234 ++-------------------------
 7 files changed, 338 insertions(+), 651 deletions(-)


===== range-diff =====

git range-diff	slab-common-v3r0~16...slab-common-v3r0
		slab-common-v4r0~17...slab-common-v4r0:

 1:  c1ba6a2f28b4 !  1:  0f84e3cefd1a mm/slab: move NUMA-related code to __do_cache_alloc()
    @@ mm/slab.c: static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t fl
      	bool init = false;
      
     @@ mm/slab.c: slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
    + 		goto out_hooks;
      
    - 	cache_alloc_debugcheck_before(cachep, flags);
      	local_irq_save(save_flags);
     -
     -	if (nodeid == NUMA_NO_NODE)
    @@ mm/slab.c: slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, s
      	return ____cache_alloc(cachep, flags);
      }
     @@ mm/slab.c: slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
    + 		goto out;
      
    - 	cache_alloc_debugcheck_before(cachep, flags);
      	local_irq_save(save_flags);
     -	objp = __do_cache_alloc(cachep, flags);
     +	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
 2:  75e053d9e62f !  2:  ed66aae2655d mm/slab: cleanup slab_alloc() and slab_alloc_node()
    @@ mm/slab.c: static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t fl
     -	if (unlikely(ptr))
     -		goto out_hooks;
     -
    --	cache_alloc_debugcheck_before(cachep, flags);
     -	local_irq_save(save_flags);
     -	ptr = __do_cache_alloc(cachep, flags, nodeid);
     -	local_irq_restore(save_flags);
    @@ mm/slab.c: __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid _
      	unsigned long save_flags;
      	void *objp;
     @@ mm/slab.c: slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
    + 		goto out;
      
    - 	cache_alloc_debugcheck_before(cachep, flags);
      	local_irq_save(save_flags);
     -	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
     +	objp = __do_cache_alloc(cachep, flags, nodeid);
 3:  7db354b38ca6 =  3:  c84d648e0440 mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
 4:  cdb433c0c7eb =  4:  967dd62b2f55 mm/slab_common: cleanup kmalloc_track_caller()
 5:  46100ebddd00 !  5:  11b3a686bf31 mm/sl[au]b: factor out __do_kmalloc_node()
    @@ Commit message
         __kmalloc_node_track_caller().
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## mm/slab.c ##
     @@ mm/slab.c: void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 6:  efc756f837fa !  6:  f30428f4af3d mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
    @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
      
      	if (unlikely(flags & GFP_SLAB_BUG_MASK))
      		flags = kmalloc_fix_flags(flags);
    + 
    +-	flags |= __GFP_COMP;
    +-	page = alloc_pages(flags, order);
    ++	page = alloc_pages(flags | __GFP_COMP, order);
    + 	if (likely(page)) {
    + 		ret = page_address(page);
    + 		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
     @@ mm/slab_common.c: void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
      	ret = kasan_kmalloc_large(ret, size, flags);
      	/* As ret might get tagged, call kmemleak hook after KASAN. */
 7:  9e137d787056 =  7:  bd1a17ffce8c mm/slub: move kmalloc_large_node() to slab_common.c
 8:  e48d0b2adad4 !  8:  4a83cf5171f2 mm/slab_common: kmalloc_node: pass large requests to page allocator
    @@ Commit message
         __kmalloc_node_track_caller() when large objects are allocated.
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## include/linux/slab.h ##
     @@ include/linux/slab.h: static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
    @@ mm/slab_common.c: void *kmalloc_large(size_t size, gfp_t flags)
      EXPORT_SYMBOL(kmalloc_large);
      
     -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
    -+static __always_inline
    -+void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    ++void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      {
      	struct page *page;
      	void *ptr = NULL;
    @@ mm/slab_common.c: void *kmalloc_large_node(size_t size, gfp_t flags, int node)
      	return ptr;
      }
     +
    -+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    -+{
    -+	return __kmalloc_large_node_notrace(size, flags, node);
    -+}
    -+
     +void *kmalloc_large_node(size_t size, gfp_t flags, int node)
     +{
    -+	void *ret = __kmalloc_large_node_notrace(size, flags, node);
    ++	void *ret = kmalloc_large_node_notrace(size, flags, node);
     +
     +	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
     +			   PAGE_SIZE << get_order(size), flags, node);
 9:  7e813b9c9b0b !  9:  a94e5405bbc5 mm/slab_common: cleanup kmalloc_large()
    @@ Commit message
         mm/slab_common: cleanup kmalloc_large()
     
         Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
    -    make kmalloc_large() wrapper of __kmalloc_large_node_notrace().
    +    make kmalloc_large() wrapper of kmalloc_large_node_notrace().
     
         In the meantime, add missing flag fix code in
    -    __kmalloc_large_node_notrace().
    +    kmalloc_large_node_notrace().
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## mm/slab_common.c ##
     @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
    @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
     -	if (unlikely(flags & GFP_SLAB_BUG_MASK))
     -		flags = kmalloc_fix_flags(flags);
     -
    --	flags |= __GFP_COMP;
    --	page = alloc_pages(flags, order);
    +-	page = alloc_pages(flags | __GFP_COMP, order);
     -	if (likely(page)) {
     -		ret = page_address(page);
     -		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
    @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
     -}
     -EXPORT_SYMBOL(kmalloc_large);
      
    - static __always_inline
    - void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    -@@ mm/slab_common.c: void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    + void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    + {
    +@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      	void *ptr = NULL;
      	unsigned int order = get_order(size);
      
    @@ mm/slab_common.c: void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, i
      	flags |= __GFP_COMP;
      	page = alloc_pages_node(node, flags, order);
      	if (page) {
    -@@ mm/slab_common.c: void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    +@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      	return ptr;
      }
      
     +void *kmalloc_large(size_t size, gfp_t flags)
     +{
    -+	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
    ++	void *ret = kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
     +
     +	trace_kmalloc(_RET_IP_, ret, NULL, size,
     +		      PAGE_SIZE << get_order(size), flags);
    @@ mm/slab_common.c: void *__kmalloc_large_node_notrace(size_t size, gfp_t flags, i
     +}
     +EXPORT_SYMBOL(kmalloc_large);
     +
    - void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    + void *kmalloc_large_node(size_t size, gfp_t flags, int node)
      {
    - 	return __kmalloc_large_node_notrace(size, flags, node);
    + 	void *ret = kmalloc_large_node_notrace(size, flags, node);
10:  6ad83caba0a7 ! 10:  c46928674558 mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
    @@ Commit message
         maintenance of common code.
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## include/linux/slab.h ##
     @@ include/linux/slab.h: static inline unsigned int arch_slab_minalign(void)
11:  e5b712dc374c ! 11:  3215ee05c450 mm/sl[au]b: introduce common alloc/free functions without tracepoint
    @@ Commit message
         functions that does not have tracepoint.
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## mm/slab.c ##
     @@ mm/slab.c: void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
12:  e44cd126a340 ! 12:  4d70e7590d3a mm/sl[au]b: generalize kmalloc subsystem
    @@ Commit message
         kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
         to slab_common.c.
     
    +    In the meantime, rename kmalloc_large_node_notrace()
    +    to __kmalloc_large_node() and make it static as it's now only called in
    +    slab_common.c.
    +
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## mm/slab.c ##
     @@ mm/slab.c: void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
    @@ mm/slab.c: void __check_heap_object(const void *ptr, unsigned long n,
     -}
     -EXPORT_SYMBOL(__ksize);
     
    + ## mm/slab.h ##
    +@@ mm/slab.h: void create_kmalloc_caches(slab_flags_t);
    + /* Find the kmalloc slab corresponding for a certain size */
    + struct kmem_cache *kmalloc_slab(size_t, gfp_t);
    + 
    +-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
    + void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
    + 			      int node, size_t orig_size,
    + 			      unsigned long caller);
    +
      ## mm/slab_common.c ##
     @@ mm/slab_common.c: void free_large_kmalloc(struct folio *folio, void *object)
      			      -(PAGE_SIZE << order));
      	__free_pages(folio_page(folio, 0), order);
      }
     +
    ++static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
     +static __always_inline
     +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
     +{
    @@ mm/slab_common.c: void free_large_kmalloc(struct folio *folio, void *object)
     +	void *ret;
     +
     +	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
    -+		ret = kmalloc_large_node_notrace(size, flags, node);
    ++		ret = __kmalloc_large_node(size, flags, node);
     +		trace_kmalloc_node(caller, ret, NULL,
     +				   size, PAGE_SIZE << get_order(size),
     +				   flags, node);
    @@ mm/slab_common.c: void free_large_kmalloc(struct folio *folio, void *object)
      #endif /* !CONFIG_SLOB */
      
      gfp_t kmalloc_fix_flags(gfp_t flags)
    +@@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
    +  * know the allocation order to free the pages properly in kfree.
    +  */
    + 
    +-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    ++void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
    + {
    + 	struct page *page;
    + 	void *ptr = NULL;
    +@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    + 
    + void *kmalloc_large(size_t size, gfp_t flags)
    + {
    +-	void *ret = kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
    ++	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
    + 
    + 	trace_kmalloc(_RET_IP_, ret, NULL, size,
    + 		      PAGE_SIZE << get_order(size), flags);
    +@@ mm/slab_common.c: EXPORT_SYMBOL(kmalloc_large);
    + 
    + void *kmalloc_large_node(size_t size, gfp_t flags, int node)
    + {
    +-	void *ret = kmalloc_large_node_notrace(size, flags, node);
    ++	void *ret = __kmalloc_large_node(size, flags, node);
    + 
    + 	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
    + 			   PAGE_SIZE << get_order(size), flags, node);
     
      ## mm/slub.c ##
     @@ mm/slub.c: static int __init setup_slub_min_objects(char *str)
 -:  ------------ > 13:  da6880a20924 mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
13:  a137bfbdb06b ! 14:  ef7a0f0d58db mm/slab_common: unify NUMA and UMA version of tracepoints
    @@ Commit message
         event classes does not makes sense at all.
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## include/trace/events/kmem.h ##
     @@
    @@ mm/slab.c: void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_l
      
      	return ret;
      }
    -@@ mm/slab.c: kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
    - 
    - 	ret = kasan_kmalloc(cachep, ret, size, flags);
    - 	trace_kmalloc(_RET_IP_, ret, cachep,
    --		      size, cachep->size, flags);
    -+		      size, cachep->size, flags, NUMA_NO_NODE);
    - 	return ret;
    - }
    - EXPORT_SYMBOL(kmem_cache_alloc_trace);
     @@ mm/slab.c: void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
      {
      	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
    @@ mm/slab.c: void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, i
     -	trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
     -				    cachep->object_size, cachep->size,
     -				    flags, nodeid);
    -+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
    -+			       cachep->object_size, cachep->size,
    -+			       flags, nodeid);
    - 
    - 	return ret;
    - }
    -@@ mm/slab.c: void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
    - 	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
    ++	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
    ++			       cachep->size, flags, nodeid);
      
    - 	ret = kasan_kmalloc(cachep, ret, size, flags);
    --	trace_kmalloc_node(_RET_IP_, ret, cachep,
    --			   size, cachep->size,
    --			   flags, nodeid);
    -+	trace_kmalloc(_RET_IP_, ret, cachep,
    -+		      size, cachep->size,
    -+		      flags, nodeid);
      	return ret;
      }
    - EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
     
      ## mm/slab_common.c ##
     @@ mm/slab_common.c: void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
      
      	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
    - 		ret = kmalloc_large_node_notrace(size, flags, node);
    + 		ret = __kmalloc_large_node(size, flags, node);
     -		trace_kmalloc_node(caller, ret, NULL,
     -				   size, PAGE_SIZE << get_order(size),
     -				   flags, node);
    -+		trace_kmalloc(_RET_IP_, ret, NULL,
    -+			      size, PAGE_SIZE << get_order(size),
    -+			      flags, node);
    ++		trace_kmalloc(_RET_IP_, ret, NULL, size,
    ++			      PAGE_SIZE << get_order(size), flags, node);
      		return ret;
      	}
      
    @@ mm/slab_common.c: void *__do_kmalloc_node(size_t size, gfp_t flags, int node, un
      	ret = kasan_kmalloc(s, ret, size, flags);
     -	trace_kmalloc_node(caller, ret, s, size,
     -			   s->size, flags, node);
    -+	trace_kmalloc(_RET_IP_, ret, s, size,
    -+		      s->size, flags, node);
    ++	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags, node);
      	return ret;
      }
      
    +@@ mm/slab_common.c: void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
    + 	void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
    + 					    size, _RET_IP_);
    + 
    +-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size,
    +-			   gfpflags, NUMA_NO_NODE);
    ++	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
    + 
    + 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    + 	return ret;
    +@@ mm/slab_common.c: void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
    + {
    + 	void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
    + 
    +-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, gfpflags, node);
    ++	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, node);
    + 
    + 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    + 	return ret;
     @@ mm/slab_common.c: void *kmalloc_large(size_t size, gfp_t flags)
    - 	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
    + 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
      
      	trace_kmalloc(_RET_IP_, ret, NULL, size,
     -		      PAGE_SIZE << get_order(size), flags);
    @@ mm/slab_common.c: void *kmalloc_large(size_t size, gfp_t flags)
      EXPORT_SYMBOL(kmalloc_large);
     @@ mm/slab_common.c: void *kmalloc_large_node(size_t size, gfp_t flags, int node)
      {
    - 	void *ret = __kmalloc_large_node_notrace(size, flags, node);
    + 	void *ret = __kmalloc_large_node(size, flags, node);
      
     -	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
     -			   PAGE_SIZE << get_order(size), flags, node);
    @@ mm/slob.c: __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long cal
      
     -		trace_kmalloc_node(caller, ret, NULL,
     -				   size, size + minalign, gfp, node);
    -+		trace_kmalloc(caller, ret, NULL,
    -+			      size, size + minalign, gfp, node);
    ++		trace_kmalloc(caller, ret, NULL, size,
    ++			      size + minalign, gfp, node);
      	} else {
      		unsigned int order = get_order(size);
      
    @@ mm/slob.c: __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long cal
      
     -		trace_kmalloc_node(caller, ret, NULL,
     -				   size, PAGE_SIZE << order, gfp, node);
    -+		trace_kmalloc(caller, ret, NULL,
    -+			      size, PAGE_SIZE << order, gfp, node);
    ++		trace_kmalloc(caller, ret, NULL, size,
    ++			      PAGE_SIZE << order, gfp, node);
      	}
      
      	kmemleak_alloc(ret, size, 1, gfp);
    @@ mm/slub.c: void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *l
     -				s->size, gfpflags);
     +				s->size, gfpflags, NUMA_NO_NODE);
      
    - 	return ret;
    - }
    -@@ mm/slub.c: void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
    - void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
    - {
    - 	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
    --	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags);
    -+	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
    - 	ret = kasan_kmalloc(s, ret, size, gfpflags);
      	return ret;
      }
     @@ mm/slub.c: void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
    @@ mm/slub.c: void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int
      
     -	trace_kmem_cache_alloc_node(_RET_IP_, ret, s,
     -				    s->object_size, s->size, gfpflags, node);
    -+	trace_kmem_cache_alloc(_RET_IP_, ret, s,
    -+			       s->object_size, s->size, gfpflags, node);
    ++	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
    ++			       s->size, gfpflags, node);
      
      	return ret;
      }
    -@@ mm/slub.c: void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
    - {
    - 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
    - 
    --	trace_kmalloc_node(_RET_IP_, ret, s,
    --			   size, s->size, gfpflags, node);
    -+	trace_kmalloc(_RET_IP_, ret, s,
    -+		      size, s->size, gfpflags, node);
    - 
    - 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    - 	return ret;
14:  6d2b911a8274 ! 15:  998f51e44ff8 mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
    @@ Commit message
              gfp flag is enough to know if it's accounted or not.
            - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
            - Avoid dereferencing s->name in when not using kmem_cache_free event.
    +       - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB
     
    +    Cc: Vasily Averin <vasily.averin@linux.dev>
         Suggested-by: Vlastimil Babka <vbabka@suse.cz>
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## include/trace/events/kmem.h ##
     @@
    @@ include/trace/events/kmem.h: DECLARE_EVENT_CLASS(kmem_alloc,
      		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
      		__entry->node		= node;
      		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
    + 					  ((gfp_flags & __GFP_ACCOUNT) ||
    +-					  (s && s->flags & SLAB_ACCOUNT)) : false;
    ++					  (s->flags & SLAB_ACCOUNT)) : false;
    + 	),
    + 
    + 	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
     @@ include/trace/events/kmem.h: DECLARE_EVENT_CLASS(kmem_alloc,
      		__entry->accounted ? "true" : "false")
      );
    @@ include/trace/events/kmem.h: DECLARE_EVENT_CLASS(kmem_alloc,
     +		__field(	size_t,		bytes_alloc	)
     +		__field(	unsigned long,	gfp_flags	)
     +		__field(	int,		node		)
    -+		__field(	bool,		accounted	)
     +	),
      
     -	TP_PROTO(unsigned long call_site, const void *ptr,
    @@ include/trace/events/kmem.h: DECLARE_EVENT_CLASS(kmem_alloc,
     +		__entry->bytes_alloc	= bytes_alloc;
     +		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
     +		__entry->node		= node;
    -+		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
    -+					  (gfp_flags & __GFP_ACCOUNT) : false;
     +	),
      
     -	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
    @@ include/trace/events/kmem.h: DECLARE_EVENT_CLASS(kmem_alloc,
     +		__entry->bytes_alloc,
     +		show_gfp_flags(__entry->gfp_flags),
     +		__entry->node,
    -+		__entry->accounted ? "true" : "false")
    ++		(IS_ENABLED(CONFIG_MEMCG_KMEM) &&
    ++		 (__entry->gfp_flags & __GFP_ACCOUNT)) ? "true" : "false")
      );
      
      TRACE_EVENT(kfree,
    @@ mm/slab.c: void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_l
      
      	return ret;
      }
    -@@ mm/slab.c: kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
    - 	ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
    - 
    - 	ret = kasan_kmalloc(cachep, ret, size, flags);
    --	trace_kmalloc(_RET_IP_, ret, cachep,
    --		      size, cachep->size, flags, NUMA_NO_NODE);
    -+	trace_kmalloc(_RET_IP_, ret,
    -+		      size, cachep->size,
    -+		      flags, NUMA_NO_NODE);
    - 	return ret;
    - }
    - EXPORT_SYMBOL(kmem_cache_alloc_trace);
     @@ mm/slab.c: void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
      {
      	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
      
    --	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
    --			       cachep->object_size, cachep->size,
    --			       flags, nodeid);
    +-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
    +-			       cachep->size, flags, nodeid);
     +	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
      
      	return ret;
      }
    -@@ mm/slab.c: void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
    - 	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
    - 
    - 	ret = kasan_kmalloc(cachep, ret, size, flags);
    --	trace_kmalloc(_RET_IP_, ret, cachep,
    -+	trace_kmalloc(_RET_IP_, ret,
    - 		      size, cachep->size,
    - 		      flags, nodeid);
    - 	return ret;
     @@ mm/slab.c: void kmem_cache_free(struct kmem_cache *cachep, void *objp)
      	if (!cachep)
      		return;
    @@ mm/slab_common.c
     @@ mm/slab_common.c: void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
      
      	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
    - 		ret = kmalloc_large_node_notrace(size, flags, node);
    --		trace_kmalloc(_RET_IP_, ret, NULL,
    -+		trace_kmalloc(_RET_IP_, ret,
    - 			      size, PAGE_SIZE << get_order(size),
    - 			      flags, node);
    + 		ret = __kmalloc_large_node(size, flags, node);
    +-		trace_kmalloc(_RET_IP_, ret, NULL, size,
    ++		trace_kmalloc(_RET_IP_, ret, size,
    + 			      PAGE_SIZE << get_order(size), flags, node);
      		return ret;
    + 	}
     @@ mm/slab_common.c: void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
      
      	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
      	ret = kasan_kmalloc(s, ret, size, flags);
    --	trace_kmalloc(_RET_IP_, ret, s, size,
    --		      s->size, flags, node);
    -+	trace_kmalloc(_RET_IP_, ret,
    -+		      size, s->size,
    -+		      flags, node);
    +-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags, node);
    ++	trace_kmalloc(_RET_IP_, ret, size, s->size, flags, node);
      	return ret;
      }
      
    +@@ mm/slab_common.c: void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
    + 	void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
    + 					    size, _RET_IP_);
    + 
    +-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
    ++	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
    + 
    + 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    + 	return ret;
    +@@ mm/slab_common.c: void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
    + {
    + 	void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
    + 
    +-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, node);
    ++	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
    + 
    + 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    + 	return ret;
     @@ mm/slab_common.c: void *kmalloc_large(size_t size, gfp_t flags)
      {
    - 	void *ret = __kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
    + 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
      
     -	trace_kmalloc(_RET_IP_, ret, NULL, size,
     -		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
    -+	trace_kmalloc(_RET_IP_, ret,
    -+		      size, PAGE_SIZE << get_order(size),
    ++	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
     +		      flags, NUMA_NO_NODE);
      	return ret;
      }
      EXPORT_SYMBOL(kmalloc_large);
     @@ mm/slab_common.c: void *kmalloc_large_node(size_t size, gfp_t flags, int node)
      {
    - 	void *ret = __kmalloc_large_node_notrace(size, flags, node);
    + 	void *ret = __kmalloc_large_node(size, flags, node);
      
     -	trace_kmalloc(_RET_IP_, ret, NULL, size,
     -		      PAGE_SIZE << get_order(size), flags, node);
    -+	trace_kmalloc(_RET_IP_, ret,
    -+		      size, PAGE_SIZE << get_order(size),
    ++	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
     +		      flags, node);
      	return ret;
      }
    @@ mm/slob.c: __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long cal
      		*m = size;
      		ret = (void *)m + minalign;
      
    --		trace_kmalloc(caller, ret, NULL,
    --			      size, size + minalign, gfp, node);
    +-		trace_kmalloc(caller, ret, NULL, size,
    +-			      size + minalign, gfp, node);
     +		trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
      	} else {
      		unsigned int order = get_order(size);
    @@ mm/slob.c: __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long cal
      			gfp |= __GFP_COMP;
      		ret = slob_new_pages(gfp, order, node);
      
    --		trace_kmalloc(caller, ret, NULL,
    --			      size, PAGE_SIZE << order, gfp, node);
    +-		trace_kmalloc(caller, ret, NULL, size,
    +-			      PAGE_SIZE << order, gfp, node);
     +		trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
      	}
      
      	kmemleak_alloc(ret, size, 1, gfp);
    +@@ mm/slob.c: int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
    + 		/* leave room for rcu footer at the end of object */
    + 		c->size += sizeof(struct slob_rcu);
    + 	}
    ++
    ++	/* Actual size allocated */
    ++	c->size = SLOB_UNITS(c->size) * SLOB_UNIT;
    + 	c->flags = flags;
    + 	return 0;
    + }
     @@ mm/slob.c: static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
      
      	if (c->size < PAGE_SIZE) {
    @@ mm/slub.c: void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *l
     -				s->size, gfpflags, NUMA_NO_NODE);
     +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
      
    - 	return ret;
    - }
    -@@ mm/slub.c: void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
    - void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
    - {
    - 	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
    --	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
    -+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
    - 	ret = kasan_kmalloc(s, ret, size, gfpflags);
      	return ret;
      }
     @@ mm/slub.c: void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
      {
      	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
      
    --	trace_kmem_cache_alloc(_RET_IP_, ret, s,
    --			       s->object_size, s->size, gfpflags, node);
    +-	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
    +-			       s->size, gfpflags, node);
     +	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
      
      	return ret;
      }
    -@@ mm/slub.c: void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
    - {
    - 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
    - 
    --	trace_kmalloc(_RET_IP_, ret, s,
    --		      size, s->size, gfpflags, node);
    -+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
    - 
    - 	ret = kasan_kmalloc(s, ret, size, gfpflags);
    - 	return ret;
     @@ mm/slub.c: void kmem_cache_free(struct kmem_cache *s, void *x)
      	s = cache_from_obj(s, x);
      	if (!s)
15:  566fdd67515d ! 16:  cd1a424103f5 mm/slab_common: move definition of __ksize() to mm/slab.h
    @@ Metadata
     Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
     
      ## Commit message ##
    -    mm/slab_common: move definition of __ksize() to mm/slab.h
    +    mm/slab_common: move declaration of __ksize() to mm/slab.h
     
         __ksize() is only called by KASAN. Remove export symbol and move
    -    definition to mm/slab.h as we don't want to grow its callers.
    +    declaration to mm/slab.h as we don't want to grow its callers.
     
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
         Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    @@ mm/slab_common.c: size_t __ksize(const void *object)
      	return slab_ksize(folio_slab(folio)->slab_cache);
      }
     -EXPORT_SYMBOL(__ksize);
    - #endif /* !CONFIG_SLOB */
      
    - gfp_t kmalloc_fix_flags(gfp_t flags)
    + #ifdef CONFIG_TRACING
    + void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
     
      ## mm/slob.c ##
     @@ mm/slob.c: size_t __ksize(const void *block)
16:  2c99a66a9307 ! 17:  11bd80a065e4 mm/sl[au]b: check if large object is valid in __ksize()
    @@ Metadata
      ## Commit message ##
         mm/sl[au]b: check if large object is valid in __ksize()
     
    -    __ksize() returns size of objects allocated from slab allocator.
    -    When invalid object is passed to __ksize(), returning zero
    -    prevents further memory corruption and makes caller be able to
    -    check if there is an error.
    -
         If address of large object is not beginning of folio or size of
    -    the folio is too small, it must be invalid. Return zero in such cases.
    +    the folio is too small, it must be invalid. BUG() in such cases.
     
    +    Cc: Marco Elver <elver@google.com>
         Suggested-by: Vlastimil Babka <vbabka@suse.cz>
         Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
    +    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
     
      ## mm/slab_common.c ##
     @@ mm/slab_common.c: size_t __ksize(const void *object)
    @@ mm/slab_common.c: size_t __ksize(const void *object)
      
     -	if (unlikely(!folio_test_slab(folio)))
     +	if (unlikely(!folio_test_slab(folio))) {
    -+		if (WARN_ON(object != folio_address(folio) ||
    -+				folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
    -+			return 0;
    ++		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
    ++		BUG_ON(object != folio_address(folio));
      		return folio_size(folio);
     +	}
      
-- 
2.32.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 01/17] mm/slab: move NUMA-related code to __do_cache_alloc()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 02/17] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

To implement slab_alloc_node() independent of NUMA configuration,
move NUMA fallback/alternate allocation code into __do_cache_alloc().

One functional change here is not to check availability of node
when allocating from local node.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 68 +++++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 37 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 10e96137b44f..1656393f55cb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3180,13 +3180,14 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	return obj ? obj : fallback_alloc(cachep, flags);
 }
 
+static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
+
 static __always_inline void *
 slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
 		   unsigned long caller)
 {
 	unsigned long save_flags;
 	void *ptr;
-	int slab_node = numa_mem_id();
 	struct obj_cgroup *objcg = NULL;
 	bool init = false;
 
@@ -3200,30 +3201,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
 		goto out_hooks;
 
 	local_irq_save(save_flags);
-
-	if (nodeid == NUMA_NO_NODE)
-		nodeid = slab_node;
-
-	if (unlikely(!get_node(cachep, nodeid))) {
-		/* Node not bootstrapped yet */
-		ptr = fallback_alloc(cachep, flags);
-		goto out;
-	}
-
-	if (nodeid == slab_node) {
-		/*
-		 * Use the locally cached objects if possible.
-		 * However ____cache_alloc does not allow fallback
-		 * to other nodes. It may fail while we still have
-		 * objects on other nodes available.
-		 */
-		ptr = ____cache_alloc(cachep, flags);
-		if (ptr)
-			goto out;
-	}
-	/* ___cache_alloc_node can fall back to other nodes */
-	ptr = ____cache_alloc_node(cachep, flags, nodeid);
-out:
+	ptr = __do_cache_alloc(cachep, flags, nodeid);
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 	init = slab_want_init_on_alloc(flags, cachep);
@@ -3234,31 +3212,46 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_
 }
 
 static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
-	void *objp;
+	void *objp = NULL;
+	int slab_node = numa_mem_id();
 
-	if (current->mempolicy || cpuset_do_slab_mem_spread()) {
-		objp = alternate_node_alloc(cache, flags);
-		if (objp)
-			goto out;
+	if (nodeid == NUMA_NO_NODE) {
+		if (current->mempolicy || cpuset_do_slab_mem_spread()) {
+			objp = alternate_node_alloc(cachep, flags);
+			if (objp)
+				goto out;
+		}
+		/*
+		 * Use the locally cached objects if possible.
+		 * However ____cache_alloc does not allow fallback
+		 * to other nodes. It may fail while we still have
+		 * objects on other nodes available.
+		 */
+		objp = ____cache_alloc(cachep, flags);
+		nodeid = slab_node;
+	} else if (nodeid == slab_node) {
+		objp = ____cache_alloc(cachep, flags);
+	} else if (!get_node(cachep, nodeid)) {
+		/* Node not bootstrapped yet */
+		objp = fallback_alloc(cachep, flags);
+		goto out;
 	}
-	objp = ____cache_alloc(cache, flags);
 
 	/*
 	 * We may just have run out of memory on the local node.
 	 * ____cache_alloc_node() knows how to locate memory on other nodes
 	 */
 	if (!objp)
-		objp = ____cache_alloc_node(cache, flags, numa_mem_id());
-
+		objp = ____cache_alloc_node(cachep, flags, nodeid);
 out:
 	return objp;
 }
 #else
 
 static __always_inline void *
-__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unused)
 {
 	return ____cache_alloc(cachep, flags);
 }
@@ -3284,7 +3277,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 		goto out;
 
 	local_irq_save(save_flags);
-	objp = __do_cache_alloc(cachep, flags);
+	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
@@ -3521,7 +3514,8 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 
 	local_irq_disable();
 	for (i = 0; i < size; i++) {
-		void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);
+		void *objp = kfence_alloc(s, s->object_size, flags) ?:
+			     __do_cache_alloc(s, flags, NUMA_NO_NODE);
 
 		if (unlikely(!objp))
 			goto error;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 02/17] mm/slab: cleanup slab_alloc() and slab_alloc_node()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 01/17] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 03/17] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Make slab_alloc_node() available even when CONFIG_NUMA=n and
make slab_alloc() wrapper of slab_alloc_node().

This is necessary for further cleanup.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 49 +++++++++++++------------------------------------
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1656393f55cb..748dd085f38e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3180,37 +3180,6 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	return obj ? obj : fallback_alloc(cachep, flags);
 }
 
-static void *__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid);
-
-static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
-		   unsigned long caller)
-{
-	unsigned long save_flags;
-	void *ptr;
-	struct obj_cgroup *objcg = NULL;
-	bool init = false;
-
-	flags &= gfp_allowed_mask;
-	cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
-	if (unlikely(!cachep))
-		return NULL;
-
-	ptr = kfence_alloc(cachep, orig_size, flags);
-	if (unlikely(ptr))
-		goto out_hooks;
-
-	local_irq_save(save_flags);
-	ptr = __do_cache_alloc(cachep, flags, nodeid);
-	local_irq_restore(save_flags);
-	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
-	init = slab_want_init_on_alloc(flags, cachep);
-
-out_hooks:
-	slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr, init);
-	return ptr;
-}
-
 static __always_inline void *
 __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
@@ -3259,8 +3228,8 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags, int nodeid __maybe_unus
 #endif /* CONFIG_NUMA */
 
 static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
-	   size_t orig_size, unsigned long caller)
+slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+		int nodeid, size_t orig_size, unsigned long caller)
 {
 	unsigned long save_flags;
 	void *objp;
@@ -3277,7 +3246,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 		goto out;
 
 	local_irq_save(save_flags);
-	objp = __do_cache_alloc(cachep, flags, NUMA_NO_NODE);
+	objp = __do_cache_alloc(cachep, flags, nodeid);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
@@ -3288,6 +3257,14 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
 	return objp;
 }
 
+static __always_inline void *
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+	   size_t orig_size, unsigned long caller)
+{
+	return slab_alloc_node(cachep, lru, flags, NUMA_NO_NODE, orig_size,
+			       caller);
+}
+
 /*
  * Caller needs to acquire correct kmem_cache_node's list_lock
  * @list: List of detached free slabs should be freed by caller
@@ -3574,7 +3551,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
  */
 void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
-	void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);
+	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
 	trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
 				    cachep->object_size, cachep->size,
@@ -3592,7 +3569,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 {
 	void *ret;
 
-	ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);
+	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc_node(_RET_IP_, ret, cachep,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 03/17] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 01/17] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 02/17] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 04/17] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n,
remove CONFIG_NUMA ifdefs for common kmalloc functions.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 28 ----------------------------
 mm/slab.c            |  2 --
 mm/slob.c            |  5 +----
 mm/slub.c            |  6 ------
 4 files changed, 1 insertion(+), 40 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..4754c834b0e3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -456,38 +456,18 @@ static __always_inline void kfree_bulk(size_t size, void **p)
 	kmem_cache_free_bulk(NULL, size, p);
 }
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
 							 __alloc_size(1);
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
 									 __malloc;
-#else
-static __always_inline __alloc_size(1) void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __kmalloc(size, flags);
-}
-
-static __always_inline void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node)
-{
-	return kmem_cache_alloc(s, flags);
-}
-#endif
 
 #ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
 				   __assume_slab_alignment __alloc_size(3);
 
-#ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
 					 int node, size_t size) __assume_slab_alignment
 								__alloc_size(4);
-#else
-static __always_inline __alloc_size(4) void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
-						 gfp_t gfpflags, int node, size_t size)
-{
-	return kmem_cache_alloc_trace(s, gfpflags, size);
-}
-#endif /* CONFIG_NUMA */
 
 #else /* CONFIG_TRACING */
 static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s,
@@ -701,20 +681,12 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
 }
 
 
-#ifdef CONFIG_NUMA
 extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
 					 unsigned long caller) __alloc_size(1);
 #define kmalloc_node_track_caller(size, flags, node) \
 	__kmalloc_node_track_caller(size, flags, node, \
 			_RET_IP_)
 
-#else /* CONFIG_NUMA */
-
-#define kmalloc_node_track_caller(size, flags, node) \
-	kmalloc_track_caller(size, flags)
-
-#endif /* CONFIG_NUMA */
-
 /*
  * Shortcuts
  */
diff --git a/mm/slab.c b/mm/slab.c
index 748dd085f38e..0acd65358c83 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3535,7 +3535,6 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
 #endif
 
-#ifdef CONFIG_NUMA
 /**
  * kmem_cache_alloc_node - Allocate an object on the specified node
  * @cachep: The cache to allocate from.
@@ -3609,7 +3608,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
 	return __do_kmalloc_node(size, flags, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_PRINTK
 void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
diff --git a/mm/slob.c b/mm/slob.c
index 2bd4f476c340..74d850967213 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -536,14 +536,12 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
 }
 EXPORT_SYMBOL(__kmalloc_track_caller);
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
 					int node, unsigned long caller)
 {
 	return __do_kmalloc_node(size, gfp, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif
 
 void kfree(const void *block)
 {
@@ -647,7 +645,7 @@ void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_
 	return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru);
-#ifdef CONFIG_NUMA
+
 void *__kmalloc_node(size_t size, gfp_t gfp, int node)
 {
 	return __do_kmalloc_node(size, gfp, node, _RET_IP_);
@@ -659,7 +657,6 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int node)
 	return slob_alloc_node(cachep, gfp, node);
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
-#endif
 
 static void __kmem_cache_free(void *b, int size)
 {
diff --git a/mm/slub.c b/mm/slub.c
index 862dbd9af4f5..b29b3c9d3175 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3287,7 +3287,6 @@ void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 EXPORT_SYMBOL(kmem_cache_alloc_trace);
 #endif
 
-#ifdef CONFIG_NUMA
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
@@ -3314,7 +3313,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
-#endif	/* CONFIG_NUMA */
 
 /*
  * Slow path handling. This may still be called frequently since objects
@@ -4427,7 +4425,6 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-#ifdef CONFIG_NUMA
 static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
@@ -4474,7 +4471,6 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	return ret;
 }
 EXPORT_SYMBOL(__kmalloc_node);
-#endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
@@ -4930,7 +4926,6 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
 }
 EXPORT_SYMBOL(__kmalloc_track_caller);
 
-#ifdef CONFIG_NUMA
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 					int node, unsigned long caller)
 {
@@ -4960,7 +4955,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 	return ret;
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
-#endif
 
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct slab *slab)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 04/17] mm/slab_common: cleanup kmalloc_track_caller()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (2 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 03/17] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 05/17] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 17 ++++++++---------
 mm/slab.c            |  6 ------
 mm/slob.c            |  6 ------
 mm/slub.c            | 22 ----------------------
 4 files changed, 8 insertions(+), 43 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4754c834b0e3..a0e57df3d5a4 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -651,6 +651,12 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
 	return kmalloc_array(n, size, flags | __GFP_ZERO);
 }
 
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
+				  unsigned long caller) __alloc_size(1);
+#define kmalloc_node_track_caller(size, flags, node) \
+	__kmalloc_node_track_caller(size, flags, node, \
+				    _RET_IP_)
+
 /*
  * kmalloc_track_caller is a special version of kmalloc that records the
  * calling function of the routine calling it for slab leak tracking instead
@@ -659,9 +665,9 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
  * allocator where we care about the real place the memory allocation
  * request comes from.
  */
-extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller);
 #define kmalloc_track_caller(size, flags) \
-	__kmalloc_track_caller(size, flags, _RET_IP_)
+	__kmalloc_node_track_caller(size, flags, \
+				    NUMA_NO_NODE, _RET_IP_)
 
 static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
 							  int node)
@@ -680,13 +686,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t
 	return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
 }
 
-
-extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
-					 unsigned long caller) __alloc_size(1);
-#define kmalloc_node_track_caller(size, flags, node) \
-	__kmalloc_node_track_caller(size, flags, node, \
-			_RET_IP_)
-
 /*
  * Shortcuts
  */
diff --git a/mm/slab.c b/mm/slab.c
index 0acd65358c83..611e630ff860 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3665,12 +3665,6 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
-{
-	return __do_kmalloc(size, flags, caller);
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 /**
  * kmem_cache_free - Deallocate an object
  * @cachep: The cache the allocation was from.
diff --git a/mm/slob.c b/mm/slob.c
index 74d850967213..96b08acd72ce 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -530,12 +530,6 @@ void *__kmalloc(size_t size, gfp_t gfp)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
-{
-	return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
 					int node, unsigned long caller)
 {
diff --git a/mm/slub.c b/mm/slub.c
index b29b3c9d3175..c82a4062f730 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4904,28 +4904,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 	return 0;
 }
 
-void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return kmalloc_large(size, gfpflags);
-
-	s = kmalloc_slab(size, gfpflags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc(s, NULL, gfpflags, caller, size);
-
-	/* Honor the call site pointer we received. */
-	trace_kmalloc(caller, ret, s, size, s->size, gfpflags);
-
-	return ret;
-}
-EXPORT_SYMBOL(__kmalloc_track_caller);
-
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 					int node, unsigned long caller)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 05/17] mm/sl[au]b: factor out __do_kmalloc_node()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (3 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 04/17] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 06/17] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller()
mostly do same job. Factor out common code into __do_kmalloc_node().

Note that this patch also fixes missing kasan_kmalloc() in SLUB's
__kmalloc_node_track_caller().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 30 +----------------------
 mm/slub.c | 71 +++++++++++++++----------------------------------------
 2 files changed, 20 insertions(+), 81 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 611e630ff860..8c08d7f3dead 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3631,37 +3631,9 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 }
 #endif
 
-/**
- * __do_kmalloc - allocate memory
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @caller: function caller for debug tracking of the caller
- *
- * Return: pointer to the allocated memory or %NULL in case of error
- */
-static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
-					  unsigned long caller)
-{
-	struct kmem_cache *cachep;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return NULL;
-	cachep = kmalloc_slab(size, flags);
-	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
-		return cachep;
-	ret = slab_alloc(cachep, NULL, flags, size, caller);
-
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc(caller, ret, cachep,
-		      size, cachep->size, flags);
-
-	return ret;
-}
-
 void *__kmalloc(size_t size, gfp_t flags)
 {
-	return __do_kmalloc(size, flags, _RET_IP_);
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
 }
 EXPORT_SYMBOL(__kmalloc);
 
diff --git a/mm/slub.c b/mm/slub.c
index c82a4062f730..f9929ba858ec 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4402,29 +4402,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return kmalloc_large(size, flags);
-
-	s = kmalloc_slab(size, flags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
-
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags);
-
-	ret = kasan_kmalloc(s, ret, size, flags);
-
-	return ret;
-}
-EXPORT_SYMBOL(__kmalloc);
-
 static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
@@ -4442,7 +4419,8 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 	return kmalloc_large_node_hook(ptr, size, flags);
 }
 
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
 	struct kmem_cache *s;
 	void *ret;
@@ -4450,7 +4428,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = kmalloc_large_node(size, flags, node);
 
-		trace_kmalloc_node(_RET_IP_, ret, NULL,
+		trace_kmalloc_node(caller, ret, NULL,
 				   size, PAGE_SIZE << get_order(size),
 				   flags, node);
 
@@ -4462,16 +4440,28 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
+	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
 
-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, flags, node);
+	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
 
 	ret = kasan_kmalloc(s, ret, size, flags);
 
 	return ret;
 }
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
 EXPORT_SYMBOL(__kmalloc_node);
 
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
  * Rejects incorrectly sized objects and objects that are to be copied
@@ -4905,32 +4895,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 }
 
 void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
-					int node, unsigned long caller)
+				  int node, unsigned long caller)
 {
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node(size, gfpflags, node);
-
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   gfpflags, node);
-
-		return ret;
-	}
-
-	s = kmalloc_slab(size, gfpflags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
-
-	/* Honor the call site pointer we received. */
-	trace_kmalloc_node(caller, ret, s, size, s->size, gfpflags, node);
-
-	return ret;
+	return __do_kmalloc_node(size, gfpflags, node, caller);
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 06/17] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (4 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 05/17] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 07/17] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

There is no caller of kmalloc_order_trace() except kmalloc_large().
Fold it into kmalloc_large() and remove kmalloc_order{,_trace}().

Also add tracepoint in kmalloc_large() that was previously
in kmalloc_order_trace().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 22 ++--------------------
 mm/slab_common.c     | 17 ++++-------------
 2 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index a0e57df3d5a4..15a4c59da59e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
 }
 #endif /* CONFIG_TRACING */
 
-extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment
-									 __alloc_size(1);
-
-#ifdef CONFIG_TRACING
-extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-				__assume_page_alignment __alloc_size(1);
-#else
-static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags,
-								 unsigned int order)
-{
-	return kmalloc_order(size, flags, order);
-}
-#endif
-
-static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags)
-{
-	unsigned int order = get_order(size);
-	return kmalloc_order_trace(size, flags, order);
-}
-
+void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+					      __alloc_size(1);
 /**
  * kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 17996649cfe3..8b1988544b89 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -905,16 +905,16 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * directly to the page allocator. We use __GFP_COMP, because we will need to
  * know the allocation order to free the pages properly in kfree.
  */
-void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
+void *kmalloc_large(size_t size, gfp_t flags)
 {
 	void *ret = NULL;
 	struct page *page;
+	unsigned int order = get_order(size);
 
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags = kmalloc_fix_flags(flags);
 
-	flags |= __GFP_COMP;
-	page = alloc_pages(flags, order);
+	page = alloc_pages(flags | __GFP_COMP, order);
 	if (likely(page)) {
 		ret = page_address(page);
 		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
@@ -923,19 +923,10 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
 	ret = kasan_kmalloc_large(ret, size, flags);
 	/* As ret might get tagged, call kmemleak hook after KASAN. */
 	kmemleak_alloc(ret, size, 1, flags);
-	return ret;
-}
-EXPORT_SYMBOL(kmalloc_order);
-
-#ifdef CONFIG_TRACING
-void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-{
-	void *ret = kmalloc_order(size, flags, order);
 	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
 	return ret;
 }
-EXPORT_SYMBOL(kmalloc_order_trace);
-#endif
+EXPORT_SYMBOL(kmalloc_large);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
 /* Randomize a generic freelist */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 07/17] mm/slub: move kmalloc_large_node() to slab_common.c
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (5 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 06/17] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 08/17] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

In later patch SLAB will also pass requests larger than order-1 page
to page allocator. Move kmalloc_large_node() to slab_common.c.

Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is
no other caller.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h |  4 ++++
 mm/slab_common.c     | 22 ++++++++++++++++++++++
 mm/slub.c            | 25 -------------------------
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 15a4c59da59e..082499306098 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -491,6 +491,10 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g
 
 void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
 					      __alloc_size(1);
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+							     __alloc_size(1);
+
 /**
  * kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8b1988544b89..1b9101f9cb21 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -928,6 +928,28 @@ void *kmalloc_large(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(kmalloc_large);
 
+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+	struct page *page;
+	void *ptr = NULL;
+	unsigned int order = get_order(size);
+
+	flags |= __GFP_COMP;
+	page = alloc_pages_node(node, flags, order);
+	if (page) {
+		ptr = page_address(page);
+		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
+				      PAGE_SIZE << order);
+	}
+
+	ptr = kasan_kmalloc_large(ptr, size, flags);
+	/* As ptr might get tagged, call kmemleak hook after KASAN. */
+	kmemleak_alloc(ptr, size, 1, flags);
+
+	return ptr;
+}
+EXPORT_SYMBOL(kmalloc_large_node);
+
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
 /* Randomize a generic freelist */
 static void freelist_randomize(struct rnd_state *state, unsigned int *list,
diff --git a/mm/slub.c b/mm/slub.c
index f9929ba858ec..5e7819ade2c4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,14 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
  * Hooks for other subsystems that check memory allocations. In a typical
  * production configuration these hooks all should produce no code at all.
  */
-static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
-{
-	ptr = kasan_kmalloc_large(ptr, size, flags);
-	/* As ptr might get tagged, call kmemleak hook after KASAN. */
-	kmemleak_alloc(ptr, size, 1, flags);
-	return ptr;
-}
-
 static __always_inline void kfree_hook(void *x)
 {
 	kmemleak_free(x);
@@ -4402,23 +4394,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
-	struct page *page;
-	void *ptr = NULL;
-	unsigned int order = get_order(size);
-
-	flags |= __GFP_COMP;
-	page = alloc_pages_node(node, flags, order);
-	if (page) {
-		ptr = page_address(page);
-		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
-				      PAGE_SIZE << order);
-	}
-
-	return kmalloc_large_node_hook(ptr, size, flags);
-}
-
 static __always_inline
 void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 08/17] mm/slab_common: kmalloc_node: pass large requests to page allocator
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (6 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 07/17] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 09/17] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Now that kmalloc_large_node() is in common code, pass large requests
to page allocator in kmalloc_node() using kmalloc_large_node().

One problem is that currently there is no tracepoint in
kmalloc_large_node(). Instead of simply putting tracepoint in it,
use kmalloc_large_node{,_notrace} depending on its caller to show
useful address for both inlined kmalloc_node() and
__kmalloc_node_track_caller() when large objects are allocated.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 26 +++++++++++++++++++-------
 mm/slab.h            |  2 ++
 mm/slab_common.c     | 11 ++++++++++-
 mm/slub.c            |  2 +-
 4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 082499306098..fd2e129fc813 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
 	return __kmalloc(size, flags);
 }
 
+#ifndef CONFIG_SLOB
 static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
-#ifndef CONFIG_SLOB
-	if (__builtin_constant_p(size) &&
-		size <= KMALLOC_MAX_CACHE_SIZE) {
-		unsigned int i = kmalloc_index(size);
+	if (__builtin_constant_p(size)) {
+		unsigned int index;
 
-		if (!i)
+		if (size > KMALLOC_MAX_CACHE_SIZE)
+			return kmalloc_large_node(size, flags, node);
+
+		index = kmalloc_index(size);
+
+		if (!index)
 			return ZERO_SIZE_PTR;
 
 		return kmem_cache_alloc_node_trace(
-				kmalloc_caches[kmalloc_type(flags)][i],
+				kmalloc_caches[kmalloc_type(flags)][index],
 						flags, node, size);
 	}
-#endif
 	return __kmalloc_node(size, flags, node);
 }
+#else
+static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE)
+		return kmalloc_large_node(size, flags, node);
+
+	return __kmalloc_node(size, flags, node);
+}
+#endif
 
 /**
  * kmalloc_array - allocate memory for an array.
diff --git a/mm/slab.h b/mm/slab.h
index 4ec82bec15ec..40322bcf07be 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -273,6 +273,8 @@ void create_kmalloc_caches(slab_flags_t);
 
 /* Find the kmalloc slab corresponding for a certain size */
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
+
+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
 #endif
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1b9101f9cb21..7a0942d54424 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -928,7 +928,7 @@ void *kmalloc_large(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(kmalloc_large);
 
-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
 	void *ptr = NULL;
@@ -948,6 +948,15 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 
 	return ptr;
 }
+
+void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+{
+	void *ret = kmalloc_large_node_notrace(size, flags, node);
+
+	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
+			   PAGE_SIZE << get_order(size), flags, node);
+	return ret;
+}
 EXPORT_SYMBOL(kmalloc_large_node);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slub.c b/mm/slub.c
index 5e7819ade2c4..165fe87af204 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4401,7 +4401,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 	void *ret;
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node(size, flags, node);
+		ret = kmalloc_large_node_notrace(size, flags, node);
 
 		trace_kmalloc_node(caller, ret, NULL,
 				   size, PAGE_SIZE << get_order(size),
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 09/17] mm/slab_common: cleanup kmalloc_large()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (7 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 08/17] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Now that kmalloc_large() and kmalloc_large_node() do mostly same job,
make kmalloc_large() wrapper of kmalloc_large_node_notrace().

In the meantime, add missing flag fix code in
kmalloc_large_node_notrace().

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 35 +++++++++++++----------------------
 1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7a0942d54424..51ccd0545816 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -905,28 +905,6 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * directly to the page allocator. We use __GFP_COMP, because we will need to
  * know the allocation order to free the pages properly in kfree.
  */
-void *kmalloc_large(size_t size, gfp_t flags)
-{
-	void *ret = NULL;
-	struct page *page;
-	unsigned int order = get_order(size);
-
-	if (unlikely(flags & GFP_SLAB_BUG_MASK))
-		flags = kmalloc_fix_flags(flags);
-
-	page = alloc_pages(flags | __GFP_COMP, order);
-	if (likely(page)) {
-		ret = page_address(page);
-		mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
-				      PAGE_SIZE << order);
-	}
-	ret = kasan_kmalloc_large(ret, size, flags);
-	/* As ret might get tagged, call kmemleak hook after KASAN. */
-	kmemleak_alloc(ret, size, 1, flags);
-	trace_kmalloc(_RET_IP_, ret, NULL, size, PAGE_SIZE << order, flags);
-	return ret;
-}
-EXPORT_SYMBOL(kmalloc_large);
 
 void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 {
@@ -934,6 +912,9 @@ void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 	void *ptr = NULL;
 	unsigned int order = get_order(size);
 
+	if (unlikely(flags & GFP_SLAB_BUG_MASK))
+		flags = kmalloc_fix_flags(flags);
+
 	flags |= __GFP_COMP;
 	page = alloc_pages_node(node, flags, order);
 	if (page) {
@@ -949,6 +930,16 @@ void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 	return ptr;
 }
 
+void *kmalloc_large(size_t size, gfp_t flags)
+{
+	void *ret = kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
+
+	trace_kmalloc(_RET_IP_, ret, NULL, size,
+		      PAGE_SIZE << get_order(size), flags);
+	return ret;
+}
+EXPORT_SYMBOL(kmalloc_large);
+
 void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	void *ret = kmalloc_large_node_notrace(size, flags, node);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (8 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 09/17] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-10-14 20:58   ` Guenter Roeck
  2022-08-17 10:18 ` [PATCH v4 11/17] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

There is not much benefit for serving large objects in kmalloc().
Let's pass large requests to page allocator like SLUB for better
maintenance of common code.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h | 23 ++++-------------
 mm/slab.c            | 60 +++++++++++++++++++++++++++++++-------------
 mm/slab.h            |  3 +++
 mm/slab_common.c     | 25 ++++++++++++------
 mm/slub.c            | 19 --------------
 5 files changed, 68 insertions(+), 62 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index fd2e129fc813..4ee5b2fed164 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void)
 
 #ifdef CONFIG_SLAB
 /*
- * The largest kmalloc size supported by the SLAB allocators is
- * 32 megabyte (2^25) or the maximum allocatable page order if that is
- * less than 32 MB.
- *
- * WARNING: Its not easy to increase this value since the allocators have
- * to do various tricks to work around compiler limitations in order to
- * ensure proper constant folding.
+ * SLAB and SLUB directly allocates requests fitting in to an order-1 page
+ * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT - 1) : 25)
-#define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
+#define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
 #endif
 #endif
 
 #ifdef CONFIG_SLUB
-/*
- * SLUB directly allocates requests fitting in to an order-1 page
- * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
- */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
 #define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
 #ifndef KMALLOC_SHIFT_LOW
@@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
 	if (size <= 512 * 1024) return 19;
 	if (size <= 1024 * 1024) return 20;
 	if (size <=  2 * 1024 * 1024) return 21;
-	if (size <=  4 * 1024 * 1024) return 22;
-	if (size <=  8 * 1024 * 1024) return 23;
-	if (size <=  16 * 1024 * 1024) return 24;
-	if (size <=  32 * 1024 * 1024) return 25;
 
 	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
 		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
@@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
 	/* Will never be reached. Needed because the compiler may complain */
 	return -1;
 }
+static_assert(PAGE_SHIFT <= 20);
 #define kmalloc_index(s) __kmalloc_index(s, true)
 #endif /* !CONFIG_SLOB */
 
diff --git a/mm/slab.c b/mm/slab.c
index 8c08d7f3dead..10c9af904410 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3585,11 +3585,19 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 	struct kmem_cache *cachep;
 	void *ret;
 
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
-		return NULL;
+	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+		ret = kmalloc_large_node_notrace(size, flags, node);
+
+		trace_kmalloc_node(caller, ret, NULL, size,
+				   PAGE_SIZE << get_order(size),
+				   flags, node);
+		return ret;
+	}
+
 	cachep = kmalloc_slab(size, flags);
 	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
 		return cachep;
+
 	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 
@@ -3664,17 +3672,27 @@ EXPORT_SYMBOL(kmem_cache_free);
 
 void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 {
-	struct kmem_cache *s;
-	size_t i;
 
 	local_irq_disable();
-	for (i = 0; i < size; i++) {
+	for (int i = 0; i < size; i++) {
 		void *objp = p[i];
+		struct kmem_cache *s;
 
-		if (!orig_s) /* called via kfree_bulk */
-			s = virt_to_cache(objp);
-		else
+		if (!orig_s) {
+			struct folio *folio = virt_to_folio(objp);
+
+			/* called via kfree_bulk */
+			if (!folio_test_slab(folio)) {
+				local_irq_enable();
+				free_large_kmalloc(folio, objp);
+				local_irq_disable();
+				continue;
+			}
+			s = folio_slab(folio)->slab_cache;
+		} else {
 			s = cache_from_obj(orig_s, objp);
+		}
+
 		if (!s)
 			continue;
 
@@ -3703,20 +3721,24 @@ void kfree(const void *objp)
 {
 	struct kmem_cache *c;
 	unsigned long flags;
+	struct folio *folio;
 
 	trace_kfree(_RET_IP_, objp);
 
 	if (unlikely(ZERO_OR_NULL_PTR(objp)))
 		return;
-	local_irq_save(flags);
-	kfree_debugcheck(objp);
-	c = virt_to_cache(objp);
-	if (!c) {
-		local_irq_restore(flags);
+
+	folio = virt_to_folio(objp);
+	if (!folio_test_slab(folio)) {
+		free_large_kmalloc(folio, (void *)objp);
 		return;
 	}
-	debug_check_no_locks_freed(objp, c->object_size);
 
+	c = folio_slab(folio)->slab_cache;
+
+	local_irq_save(flags);
+	kfree_debugcheck(objp);
+	debug_check_no_locks_freed(objp, c->object_size);
 	debug_check_no_obj_freed(objp, c->object_size);
 	__cache_free(c, (void *)objp, _RET_IP_);
 	local_irq_restore(flags);
@@ -4138,15 +4160,17 @@ void __check_heap_object(const void *ptr, unsigned long n,
 size_t __ksize(const void *objp)
 {
 	struct kmem_cache *c;
-	size_t size;
+	struct folio *folio;
 
 	BUG_ON(!objp);
 	if (unlikely(objp == ZERO_SIZE_PTR))
 		return 0;
 
-	c = virt_to_cache(objp);
-	size = c ? c->object_size : 0;
+	folio = virt_to_folio(objp);
+	if (!folio_test_slab(folio))
+		return folio_size(folio);
 
-	return size;
+	c = folio_slab(folio)->slab_cache;
+	return c->object_size;
 }
 EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab.h b/mm/slab.h
index 40322bcf07be..381ba3e6b2a1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -660,6 +660,9 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 		print_tracking(cachep, x);
 	return cachep;
 }
+
+void free_large_kmalloc(struct folio *folio, void *object);
+
 #endif /* CONFIG_SLOB */
 
 static inline size_t slab_ksize(const struct kmem_cache *s)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 51ccd0545816..5a2e81f42ee9 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -744,8 +744,8 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
 
 /*
  * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time.
- * kmalloc_index() supports up to 2^25=32MB, so the final entry of the table is
- * kmalloc-32M.
+ * kmalloc_index() supports up to 2^21=2MB, so the final entry of the table is
+ * kmalloc-2M.
  */
 const struct kmalloc_info_struct kmalloc_info[] __initconst = {
 	INIT_KMALLOC_INFO(0, 0),
@@ -769,11 +769,7 @@ const struct kmalloc_info_struct kmalloc_info[] __initconst = {
 	INIT_KMALLOC_INFO(262144, 256k),
 	INIT_KMALLOC_INFO(524288, 512k),
 	INIT_KMALLOC_INFO(1048576, 1M),
-	INIT_KMALLOC_INFO(2097152, 2M),
-	INIT_KMALLOC_INFO(4194304, 4M),
-	INIT_KMALLOC_INFO(8388608, 8M),
-	INIT_KMALLOC_INFO(16777216, 16M),
-	INIT_KMALLOC_INFO(33554432, 32M)
+	INIT_KMALLOC_INFO(2097152, 2M)
 };
 
 /*
@@ -886,6 +882,21 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 	/* Kmalloc array is now usable */
 	slab_state = UP;
 }
+
+void free_large_kmalloc(struct folio *folio, void *object)
+{
+	unsigned int order = folio_order(folio);
+
+	if (WARN_ON_ONCE(order == 0))
+		pr_warn_once("object pointer: 0x%p\n", object);
+
+	kmemleak_free(object);
+	kasan_kfree_large(object);
+
+	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
+			      -(PAGE_SIZE << order));
+	__free_pages(folio_page(folio, 0), order);
+}
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index 165fe87af204..a659874c5d44 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1704,12 +1704,6 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
  * Hooks for other subsystems that check memory allocations. In a typical
  * production configuration these hooks all should produce no code at all.
  */
-static __always_inline void kfree_hook(void *x)
-{
-	kmemleak_free(x);
-	kasan_kfree_large(x);
-}
-
 static __always_inline bool slab_free_hook(struct kmem_cache *s,
 						void *x, bool init)
 {
@@ -3550,19 +3544,6 @@ struct detached_freelist {
 	struct kmem_cache *s;
 };
 
-static inline void free_large_kmalloc(struct folio *folio, void *object)
-{
-	unsigned int order = folio_order(folio);
-
-	if (WARN_ON_ONCE(order == 0))
-		pr_warn_once("object pointer: 0x%p\n", object);
-
-	kfree_hook(object);
-	mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B,
-			      -(PAGE_SIZE << order));
-	__free_pages(folio_page(folio, 0), order);
-}
-
 /*
  * This function progressively scans the array with free objects (with
  * a limited look ahead) and extract objects belonging to the same
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 11/17] mm/sl[au]b: introduce common alloc/free functions without tracepoint
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (9 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 12/17] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

To unify kmalloc functions in later patch, introduce common alloc/free
functions that does not have tracepoint.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c | 36 +++++++++++++++++++++++++++++-------
 mm/slab.h |  4 ++++
 mm/slub.c | 13 +++++++++++++
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 10c9af904410..aa61851b0a07 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3560,6 +3560,14 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 
+void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
+			     int nodeid, size_t orig_size,
+			     unsigned long caller)
+{
+	return slab_alloc_node(cachep, NULL, flags, nodeid,
+			       orig_size, caller);
+}
+
 #ifdef CONFIG_TRACING
 void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 				  gfp_t flags,
@@ -3645,6 +3653,26 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
+static __always_inline
+void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
+			  unsigned long caller)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	debug_check_no_locks_freed(objp, cachep->object_size);
+	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
+		debug_check_no_obj_freed(objp, cachep->object_size);
+	__cache_free(cachep, objp, caller);
+	local_irq_restore(flags);
+}
+
+void __kmem_cache_free(struct kmem_cache *cachep, void *objp,
+		       unsigned long caller)
+{
+	__do_kmem_cache_free(cachep, objp, caller);
+}
+
 /**
  * kmem_cache_free - Deallocate an object
  * @cachep: The cache the allocation was from.
@@ -3655,18 +3683,12 @@ EXPORT_SYMBOL(__kmalloc);
  */
 void kmem_cache_free(struct kmem_cache *cachep, void *objp)
 {
-	unsigned long flags;
 	cachep = cache_from_obj(cachep, objp);
 	if (!cachep)
 		return;
 
 	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
-	local_irq_save(flags);
-	debug_check_no_locks_freed(objp, cachep->object_size);
-	if (!(cachep->flags & SLAB_DEBUG_OBJECTS))
-		debug_check_no_obj_freed(objp, cachep->object_size);
-	__cache_free(cachep, objp, _RET_IP_);
-	local_irq_restore(flags);
+	__do_kmem_cache_free(cachep, objp, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
diff --git a/mm/slab.h b/mm/slab.h
index 381ba3e6b2a1..4e90ed0ab635 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -275,6 +275,10 @@ void create_kmalloc_caches(slab_flags_t);
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 
 void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+			      int node, size_t orig_size,
+			      unsigned long caller);
+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller);
 #endif
 
 gfp_t kmalloc_fix_flags(gfp_t flags);
diff --git a/mm/slub.c b/mm/slub.c
index a659874c5d44..a11f78c2647c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3262,6 +3262,14 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_lru);
 
+void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
+			      int node, size_t orig_size,
+			      unsigned long caller)
+{
+	return slab_alloc_node(s, NULL, gfpflags, node,
+			       caller, orig_size);
+}
+
 #ifdef CONFIG_TRACING
 void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
@@ -3526,6 +3534,11 @@ void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr)
 }
 #endif
 
+void __kmem_cache_free(struct kmem_cache *s, void *x, unsigned long caller)
+{
+	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, caller);
+}
+
 void kmem_cache_free(struct kmem_cache *s, void *x)
 {
 	s = cache_from_obj(s, x);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 12/17] mm/sl[au]b: generalize kmalloc subsystem
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (10 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 11/17] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Hyeonggon Yoo
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Now everything in kmalloc subsystem can be generalized.
Let's do it!

Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(),
kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them
to slab_common.c.

In the meantime, rename kmalloc_large_node_notrace()
to __kmalloc_large_node() and make it static as it's now only called in
slab_common.c.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.c        | 108 ----------------------------------------------
 mm/slab.h        |   1 -
 mm/slab_common.c | 109 +++++++++++++++++++++++++++++++++++++++++++++--
 mm/slub.c        |  87 -------------------------------------
 4 files changed, 106 insertions(+), 199 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index aa61851b0a07..5b234e3ab165 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3587,44 +3587,6 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
-static __always_inline void *
-__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
-	struct kmem_cache *cachep;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node_notrace(size, flags, node);
-
-		trace_kmalloc_node(caller, ret, NULL, size,
-				   PAGE_SIZE << get_order(size),
-				   flags, node);
-		return ret;
-	}
-
-	cachep = kmalloc_slab(size, flags);
-	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
-		return cachep;
-
-	ret = kmem_cache_alloc_node_trace(cachep, flags, node, size);
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-
-	return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
-		int node, unsigned long caller)
-{
-	return __do_kmalloc_node(size, flags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
 #ifdef CONFIG_PRINTK
 void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 {
@@ -3647,12 +3609,6 @@ void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 }
 #endif
 
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
 static __always_inline
 void __do_kmem_cache_free(struct kmem_cache *cachep, void *objp,
 			  unsigned long caller)
@@ -3730,43 +3686,6 @@ void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 }
 EXPORT_SYMBOL(kmem_cache_free_bulk);
 
-/**
- * kfree - free previously allocated memory
- * @objp: pointer returned by kmalloc.
- *
- * If @objp is NULL, no operation is performed.
- *
- * Don't free memory not originally allocated by kmalloc()
- * or you will run into trouble.
- */
-void kfree(const void *objp)
-{
-	struct kmem_cache *c;
-	unsigned long flags;
-	struct folio *folio;
-
-	trace_kfree(_RET_IP_, objp);
-
-	if (unlikely(ZERO_OR_NULL_PTR(objp)))
-		return;
-
-	folio = virt_to_folio(objp);
-	if (!folio_test_slab(folio)) {
-		free_large_kmalloc(folio, (void *)objp);
-		return;
-	}
-
-	c = folio_slab(folio)->slab_cache;
-
-	local_irq_save(flags);
-	kfree_debugcheck(objp);
-	debug_check_no_locks_freed(objp, c->object_size);
-	debug_check_no_obj_freed(objp, c->object_size);
-	__cache_free(c, (void *)objp, _RET_IP_);
-	local_irq_restore(flags);
-}
-EXPORT_SYMBOL(kfree);
-
 /*
  * This initializes kmem_cache_node or resizes various caches for all nodes.
  */
@@ -4169,30 +4088,3 @@ void __check_heap_object(const void *ptr, unsigned long n,
 	usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
-
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
-size_t __ksize(const void *objp)
-{
-	struct kmem_cache *c;
-	struct folio *folio;
-
-	BUG_ON(!objp);
-	if (unlikely(objp == ZERO_SIZE_PTR))
-		return 0;
-
-	folio = virt_to_folio(objp);
-	if (!folio_test_slab(folio))
-		return folio_size(folio);
-
-	c = folio_slab(folio)->slab_cache;
-	return c->object_size;
-}
-EXPORT_SYMBOL(__ksize);
diff --git a/mm/slab.h b/mm/slab.h
index 4e90ed0ab635..4d8330d57573 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -274,7 +274,6 @@ void create_kmalloc_caches(slab_flags_t);
 /* Find the kmalloc slab corresponding for a certain size */
 struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 
-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node);
 void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
 			      int node, size_t orig_size,
 			      unsigned long caller);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 5a2e81f42ee9..c8242b4e2223 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -897,6 +897,109 @@ void free_large_kmalloc(struct folio *folio, void *object)
 			      -(PAGE_SIZE << order));
 	__free_pages(folio_page(folio, 0), order);
 }
+
+static void *__kmalloc_large_node(size_t size, gfp_t flags, int node);
+static __always_inline
+void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
+{
+	struct kmem_cache *s;
+	void *ret;
+
+	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
+		ret = __kmalloc_large_node(size, flags, node);
+		trace_kmalloc_node(caller, ret, NULL,
+				   size, PAGE_SIZE << get_order(size),
+				   flags, node);
+		return ret;
+	}
+
+	s = kmalloc_slab(size, flags);
+
+	if (unlikely(ZERO_OR_NULL_PTR(s)))
+		return s;
+
+	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
+	ret = kasan_kmalloc(s, ret, size, flags);
+	trace_kmalloc_node(caller, ret, s, size,
+			   s->size, flags, node);
+	return ret;
+}
+
+void *__kmalloc_node(size_t size, gfp_t flags, int node)
+{
+	return __do_kmalloc_node(size, flags, node, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc_node);
+
+void *__kmalloc(size_t size, gfp_t flags)
+{
+	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
+}
+EXPORT_SYMBOL(__kmalloc);
+
+void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
+				  int node, unsigned long caller)
+{
+	return __do_kmalloc_node(size, flags, node, caller);
+}
+EXPORT_SYMBOL(__kmalloc_node_track_caller);
+
+/**
+ * kfree - free previously allocated memory
+ * @objp: pointer returned by kmalloc.
+ *
+ * If @objp is NULL, no operation is performed.
+ *
+ * Don't free memory not originally allocated by kmalloc()
+ * or you will run into trouble.
+ */
+void kfree(const void *object)
+{
+	struct folio *folio;
+	struct slab *slab;
+	struct kmem_cache *s;
+
+	trace_kfree(_RET_IP_, object);
+
+	if (unlikely(ZERO_OR_NULL_PTR(object)))
+		return;
+
+	folio = virt_to_folio(object);
+	if (unlikely(!folio_test_slab(folio))) {
+		free_large_kmalloc(folio, (void *)object);
+		return;
+	}
+
+	slab = folio_slab(folio);
+	s = slab->slab_cache;
+	__kmem_cache_free(s, (void *)object, _RET_IP_);
+}
+EXPORT_SYMBOL(kfree);
+
+/**
+ * __ksize -- Uninstrumented ksize.
+ * @objp: pointer to the object
+ *
+ * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
+ * safety checks as ksize() with KASAN instrumentation enabled.
+ *
+ * Return: size of the actual memory used by @objp in bytes
+ */
+size_t __ksize(const void *object)
+{
+	struct folio *folio;
+
+	if (unlikely(object == ZERO_SIZE_PTR))
+		return 0;
+
+	folio = virt_to_folio(object);
+
+	if (unlikely(!folio_test_slab(folio)))
+		return folio_size(folio);
+
+	return slab_ksize(folio_slab(folio)->slab_cache);
+}
+EXPORT_SYMBOL(__ksize);
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
@@ -917,7 +1020,7 @@ gfp_t kmalloc_fix_flags(gfp_t flags)
  * know the allocation order to free the pages properly in kfree.
  */
 
-void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
+void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
 	void *ptr = NULL;
@@ -943,7 +1046,7 @@ void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
 
 void *kmalloc_large(size_t size, gfp_t flags)
 {
-	void *ret = kmalloc_large_node_notrace(size, flags, NUMA_NO_NODE);
+	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
 	trace_kmalloc(_RET_IP_, ret, NULL, size,
 		      PAGE_SIZE << get_order(size), flags);
@@ -953,7 +1056,7 @@ EXPORT_SYMBOL(kmalloc_large);
 
 void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
-	void *ret = kmalloc_large_node_notrace(size, flags, node);
+	void *ret = __kmalloc_large_node(size, flags, node);
 
 	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
 			   PAGE_SIZE << get_order(size), flags, node);
diff --git a/mm/slub.c b/mm/slub.c
index a11f78c2647c..cd49785d59e1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4388,49 +4388,6 @@ static int __init setup_slub_min_objects(char *str)
 
 __setup("slub_min_objects=", setup_slub_min_objects);
 
-static __always_inline
-void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
-{
-	struct kmem_cache *s;
-	void *ret;
-
-	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
-		ret = kmalloc_large_node_notrace(size, flags, node);
-
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   flags, node);
-
-		return ret;
-	}
-
-	s = kmalloc_slab(size, flags);
-
-	if (unlikely(ZERO_OR_NULL_PTR(s)))
-		return s;
-
-	ret = slab_alloc_node(s, NULL, flags, node, caller, size);
-
-	trace_kmalloc_node(caller, ret, s, size, s->size, flags, node);
-
-	ret = kasan_kmalloc(s, ret, size, flags);
-
-	return ret;
-}
-
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
-{
-	return __do_kmalloc_node(size, flags, node, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc_node);
-
-void *__kmalloc(size_t size, gfp_t flags)
-{
-	return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
-}
-EXPORT_SYMBOL(__kmalloc);
-
-
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
  * Rejects incorrectly sized objects and objects that are to be copied
@@ -4481,43 +4438,6 @@ void __check_heap_object(const void *ptr, unsigned long n,
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
-size_t __ksize(const void *object)
-{
-	struct folio *folio;
-
-	if (unlikely(object == ZERO_SIZE_PTR))
-		return 0;
-
-	folio = virt_to_folio(object);
-
-	if (unlikely(!folio_test_slab(folio)))
-		return folio_size(folio);
-
-	return slab_ksize(folio_slab(folio)->slab_cache);
-}
-EXPORT_SYMBOL(__ksize);
-
-void kfree(const void *x)
-{
-	struct folio *folio;
-	struct slab *slab;
-	void *object = (void *)x;
-
-	trace_kfree(_RET_IP_, x);
-
-	if (unlikely(ZERO_OR_NULL_PTR(x)))
-		return;
-
-	folio = virt_to_folio(x);
-	if (unlikely(!folio_test_slab(folio))) {
-		free_large_kmalloc(folio, object);
-		return;
-	}
-	slab = folio_slab(folio);
-	slab_free(slab->slab_cache, slab, object, NULL, &object, 1, _RET_IP_);
-}
-EXPORT_SYMBOL(kfree);
-
 #define SHRINK_PROMOTE_MAX 32
 
 /*
@@ -4863,13 +4783,6 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags)
 	return 0;
 }
 
-void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
-				  int node, unsigned long caller)
-{
-	return __do_kmalloc_node(size, gfpflags, node, caller);
-}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
-
 #ifdef CONFIG_SYSFS
 static int count_inuse(struct slab *slab)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (11 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 12/17] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-23 15:04   ` Vlastimil Babka
  2022-08-17 10:18 ` [PATCH v4 14/17] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

This patch does:
	- Despite its name, kmem_cache_alloc[_node]_trace() is hook for
	  inlined kmalloc. So rename it to kmalloc[_node]_trace().

	- Move its implementation to slab_common.c by using
          __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to
	  save a function call when CONFIG_TRACING=n.

	- Use __assume_kmalloc_alignment for kmalloc[_node]_trace
	  instead of __assume_slab_alignement. Generally kmalloc has
	  larger alignment requirements.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 include/linux/slab.h | 27 ++++++++++++++-------------
 mm/slab.c            | 35 -----------------------------------
 mm/slab_common.c     | 27 +++++++++++++++++++++++++++
 mm/slub.c            | 27 ---------------------------
 4 files changed, 41 insertions(+), 75 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4ee5b2fed164..c8e485ce8815 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -449,16 +449,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assum
 									 __malloc;
 
 #ifdef CONFIG_TRACING
-extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
-				   __assume_slab_alignment __alloc_size(3);
-
-extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
-					 int node, size_t size) __assume_slab_alignment
-								__alloc_size(4);
+void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
+		    __assume_kmalloc_alignment __alloc_size(3);
 
+void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+			 int node, size_t size) __assume_kmalloc_alignment
+						__alloc_size(4);
 #else /* CONFIG_TRACING */
-static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s,
-								    gfp_t flags, size_t size)
+/* Save a function call when CONFIG_TRACING=n */
+static __always_inline __alloc_size(3)
+void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
 {
 	void *ret = kmem_cache_alloc(s, flags);
 
@@ -466,8 +466,9 @@ static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_
 	return ret;
 }
 
-static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
-							 int node, size_t size)
+static __always_inline __alloc_size(4)
+void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+			 int node, size_t size)
 {
 	void *ret = kmem_cache_alloc_node(s, gfpflags, node);
 
@@ -550,7 +551,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
 		if (!index)
 			return ZERO_SIZE_PTR;
 
-		return kmem_cache_alloc_trace(
+		return kmalloc_trace(
 				kmalloc_caches[kmalloc_type(flags)][index],
 				flags, size);
 #endif
@@ -572,9 +573,9 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
 		if (!index)
 			return ZERO_SIZE_PTR;
 
-		return kmem_cache_alloc_node_trace(
+		return kmalloc_node_trace(
 				kmalloc_caches[kmalloc_type(flags)][index],
-						flags, node, size);
+				flags, node, size);
 	}
 	return __kmalloc_node(size, flags, node);
 }
diff --git a/mm/slab.c b/mm/slab.c
index 5b234e3ab165..8d9d0fbf9792 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3519,22 +3519,6 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);
 
-#ifdef CONFIG_TRACING
-void *
-kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
-{
-	void *ret;
-
-	ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
-
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc(_RET_IP_, ret, cachep,
-		      size, cachep->size, flags);
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_trace);
-#endif
-
 /**
  * kmem_cache_alloc_node - Allocate an object on the specified node
  * @cachep: The cache to allocate from.
@@ -3568,25 +3552,6 @@ void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 			       orig_size, caller);
 }
 
-#ifdef CONFIG_TRACING
-void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
-				  gfp_t flags,
-				  int nodeid,
-				  size_t size)
-{
-	void *ret;
-
-	ret = slab_alloc_node(cachep, NULL, flags, nodeid, size, _RET_IP_);
-
-	ret = kasan_kmalloc(cachep, ret, size, flags);
-	trace_kmalloc_node(_RET_IP_, ret, cachep,
-			   size, cachep->size,
-			   flags, nodeid);
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
-#endif
-
 #ifdef CONFIG_PRINTK
 void __kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index c8242b4e2223..d8e8c41c12f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1000,6 +1000,33 @@ size_t __ksize(const void *object)
 	return slab_ksize(folio_slab(folio)->slab_cache);
 }
 EXPORT_SYMBOL(__ksize);
+
+#ifdef CONFIG_TRACING
+void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+{
+	void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
+					    size, _RET_IP_);
+
+	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size,
+			   gfpflags, NUMA_NO_NODE);
+
+	ret = kasan_kmalloc(s, ret, size, gfpflags);
+	return ret;
+}
+EXPORT_SYMBOL(kmalloc_trace);
+
+void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+			 int node, size_t size)
+{
+	void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
+
+	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, gfpflags, node);
+
+	ret = kasan_kmalloc(s, ret, size, gfpflags);
+	return ret;
+}
+EXPORT_SYMBOL(kmalloc_node_trace);
+#endif /* !CONFIG_TRACING */
 #endif /* !CONFIG_SLOB */
 
 gfp_t kmalloc_fix_flags(gfp_t flags)
diff --git a/mm/slub.c b/mm/slub.c
index cd49785d59e1..7d7fd9d4e8fa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3270,17 +3270,6 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
 			       caller, orig_size);
 }
 
-#ifdef CONFIG_TRACING
-void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
-{
-	void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags);
-	ret = kasan_kmalloc(s, ret, size, gfpflags);
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_trace);
-#endif
-
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
@@ -3292,22 +3281,6 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 
-#ifdef CONFIG_TRACING
-void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
-				    gfp_t gfpflags,
-				    int node, size_t size)
-{
-	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
-
-	trace_kmalloc_node(_RET_IP_, ret, s,
-			   size, s->size, gfpflags, node);
-
-	ret = kasan_kmalloc(s, ret, size, gfpflags);
-	return ret;
-}
-EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
-#endif
-
 /*
  * Slow path handling. This may still be called frequently since objects
  * have a longer lifetime than the cpu slabs in most processing loads.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 14/17] mm/slab_common: unify NUMA and UMA version of tracepoints
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (12 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 15/17] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/trace/events/kmem.h | 60 ++-----------------------------------
 mm/slab.c                   |  9 +++---
 mm/slab_common.c            | 21 +++++--------
 mm/slob.c                   | 20 ++++++-------
 mm/slub.c                   |  6 ++--
 5 files changed, 27 insertions(+), 89 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4cb51ace600d..e078ebcdc4b1 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -11,62 +11,6 @@
 
 DECLARE_EVENT_CLASS(kmem_alloc,
 
-	TP_PROTO(unsigned long call_site,
-		 const void *ptr,
-		 struct kmem_cache *s,
-		 size_t bytes_req,
-		 size_t bytes_alloc,
-		 gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	call_site	)
-		__field(	const void *,	ptr		)
-		__field(	size_t,		bytes_req	)
-		__field(	size_t,		bytes_alloc	)
-		__field(	unsigned long,	gfp_flags	)
-		__field(	bool,		accounted	)
-	),
-
-	TP_fast_assign(
-		__entry->call_site	= call_site;
-		__entry->ptr		= ptr;
-		__entry->bytes_req	= bytes_req;
-		__entry->bytes_alloc	= bytes_alloc;
-		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
-		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
-					  ((gfp_flags & __GFP_ACCOUNT) ||
-					  (s && s->flags & SLAB_ACCOUNT)) : false;
-	),
-
-	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s accounted=%s",
-		(void *)__entry->call_site,
-		__entry->ptr,
-		__entry->bytes_req,
-		__entry->bytes_alloc,
-		show_gfp_flags(__entry->gfp_flags),
-		__entry->accounted ? "true" : "false")
-);
-
-DEFINE_EVENT(kmem_alloc, kmalloc,
-
-	TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
-		 size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
-
-	TP_PROTO(unsigned long call_site, const void *ptr, struct kmem_cache *s,
-		 size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags),
-
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags)
-);
-
-DECLARE_EVENT_CLASS(kmem_alloc_node,
-
 	TP_PROTO(unsigned long call_site,
 		 const void *ptr,
 		 struct kmem_cache *s,
@@ -109,7 +53,7 @@ DECLARE_EVENT_CLASS(kmem_alloc_node,
 		__entry->accounted ? "true" : "false")
 );
 
-DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
+DEFINE_EVENT(kmem_alloc, kmalloc,
 
 	TP_PROTO(unsigned long call_site, const void *ptr,
 		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
@@ -118,7 +62,7 @@ DEFINE_EVENT(kmem_alloc_node, kmalloc_node,
 	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
 );
 
-DEFINE_EVENT(kmem_alloc_node, kmem_cache_alloc_node,
+DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
 
 	TP_PROTO(unsigned long call_site, const void *ptr,
 		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
diff --git a/mm/slab.c b/mm/slab.c
index 8d9d0fbf9792..2fd400203ac2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3440,8 +3440,8 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
 {
 	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep,
-			       cachep->object_size, cachep->size, flags);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
+			       cachep->size, flags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3536,9 +3536,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
 	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc_node(_RET_IP_, ret, cachep,
-				    cachep->object_size, cachep->size,
-				    flags, nodeid);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
+			       cachep->size, flags, nodeid);
 
 	return ret;
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d8e8c41c12f1..f34be57b00c8 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -907,9 +907,8 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = __kmalloc_large_node(size, flags, node);
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << get_order(size),
-				   flags, node);
+		trace_kmalloc(_RET_IP_, ret, NULL, size,
+			      PAGE_SIZE << get_order(size), flags, node);
 		return ret;
 	}
 
@@ -920,8 +919,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
 	ret = kasan_kmalloc(s, ret, size, flags);
-	trace_kmalloc_node(caller, ret, s, size,
-			   s->size, flags, node);
+	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags, node);
 	return ret;
 }
 
@@ -1007,8 +1005,7 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 	void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
 					    size, _RET_IP_);
 
-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size,
-			   gfpflags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -1020,7 +1017,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
 {
 	void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
 
-	trace_kmalloc_node(_RET_IP_, ret, s, size, s->size, gfpflags, node);
+	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, node);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -1076,7 +1073,7 @@ void *kmalloc_large(size_t size, gfp_t flags)
 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
 	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags);
+		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large);
@@ -1085,8 +1082,8 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	void *ret = __kmalloc_large_node(size, flags, node);
 
-	trace_kmalloc_node(_RET_IP_, ret, NULL, size,
-			   PAGE_SIZE << get_order(size), flags, node);
+	trace_kmalloc(_RET_IP_, ret, NULL, size,
+		      PAGE_SIZE << get_order(size), flags, node);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large_node);
@@ -1421,8 +1418,6 @@ EXPORT_SYMBOL(ksize);
 /* Tracepoints definitions. */
 EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
-EXPORT_TRACEPOINT_SYMBOL(kmalloc_node);
-EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc_node);
 EXPORT_TRACEPOINT_SYMBOL(kfree);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
 
diff --git a/mm/slob.c b/mm/slob.c
index 96b08acd72ce..3208c56d8f82 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 		*m = size;
 		ret = (void *)m + minalign;
 
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, size + minalign, gfp, node);
+		trace_kmalloc(caller, ret, NULL, size,
+			      size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -516,8 +516,8 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 			gfp |= __GFP_COMP;
 		ret = slob_new_pages(gfp, order, node);
 
-		trace_kmalloc_node(caller, ret, NULL,
-				   size, PAGE_SIZE << order, gfp, node);
+		trace_kmalloc(caller, ret, NULL, size,
+			      PAGE_SIZE << order, gfp, node);
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
@@ -608,14 +608,14 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node, 0);
-		trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
-					    SLOB_UNITS(c->size) * SLOB_UNIT,
-					    flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+				       SLOB_UNITS(c->size) * SLOB_UNIT,
+				       flags, node);
 	} else {
 		b = slob_new_pages(flags, get_order(c->size), node);
-		trace_kmem_cache_alloc_node(_RET_IP_, b, NULL, c->object_size,
-					    PAGE_SIZE << get_order(c->size),
-					    flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
+				       PAGE_SIZE << get_order(c->size),
+				       flags, node);
 	}
 
 	if (b && c->ctor) {
diff --git a/mm/slub.c b/mm/slub.c
index 7d7fd9d4e8fa..22e4ccf06638 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3244,7 +3244,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
-				s->size, gfpflags);
+				s->size, gfpflags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3274,8 +3274,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc_node(_RET_IP_, ret, s,
-				    s->object_size, s->size, gfpflags, node);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
+			       s->size, gfpflags, node);
 
 	return ret;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 15/17] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (13 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 14/17] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 16/17] mm/slab_common: move declaration of __ksize() to mm/slab.h Hyeonggon Yoo
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel, Vasily Averin

Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
   - Do not pass pointer to struct kmem_cache to trace_kmalloc.
     gfp flag is enough to know if it's accounted or not.
   - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
   - Avoid dereferencing s->name in when not using kmem_cache_free event.
   - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB

Cc: Vasily Averin <vasily.averin@linux.dev>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/trace/events/kmem.h | 64 ++++++++++++++++++++++++-------------
 mm/slab.c                   |  8 ++---
 mm/slab_common.c            | 16 +++++-----
 mm/slob.c                   | 19 +++++------
 mm/slub.c                   |  8 ++---
 5 files changed, 64 insertions(+), 51 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index e078ebcdc4b1..8c6f96604244 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -9,17 +9,15 @@
 #include <linux/tracepoint.h>
 #include <trace/events/mmflags.h>
 
-DECLARE_EVENT_CLASS(kmem_alloc,
+TRACE_EVENT(kmem_cache_alloc,
 
 	TP_PROTO(unsigned long call_site,
 		 const void *ptr,
 		 struct kmem_cache *s,
-		 size_t bytes_req,
-		 size_t bytes_alloc,
 		 gfp_t gfp_flags,
 		 int node),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node),
+	TP_ARGS(call_site, ptr, s, gfp_flags, node),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	call_site	)
@@ -34,13 +32,13 @@ DECLARE_EVENT_CLASS(kmem_alloc,
 	TP_fast_assign(
 		__entry->call_site	= call_site;
 		__entry->ptr		= ptr;
-		__entry->bytes_req	= bytes_req;
-		__entry->bytes_alloc	= bytes_alloc;
+		__entry->bytes_req	= s->object_size;
+		__entry->bytes_alloc	= s->size;
 		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
 		__entry->node		= node;
 		__entry->accounted	= IS_ENABLED(CONFIG_MEMCG_KMEM) ?
 					  ((gfp_flags & __GFP_ACCOUNT) ||
-					  (s && s->flags & SLAB_ACCOUNT)) : false;
+					  (s->flags & SLAB_ACCOUNT)) : false;
 	),
 
 	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
@@ -53,22 +51,44 @@ DECLARE_EVENT_CLASS(kmem_alloc,
 		__entry->accounted ? "true" : "false")
 );
 
-DEFINE_EVENT(kmem_alloc, kmalloc,
+TRACE_EVENT(kmalloc,
 
-	TP_PROTO(unsigned long call_site, const void *ptr,
-		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
-		 gfp_t gfp_flags, int node),
+	TP_PROTO(unsigned long call_site,
+		 const void *ptr,
+		 size_t bytes_req,
+		 size_t bytes_alloc,
+		 gfp_t gfp_flags,
+		 int node),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
-);
+	TP_ARGS(call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node),
 
-DEFINE_EVENT(kmem_alloc, kmem_cache_alloc,
+	TP_STRUCT__entry(
+		__field(	unsigned long,	call_site	)
+		__field(	const void *,	ptr		)
+		__field(	size_t,		bytes_req	)
+		__field(	size_t,		bytes_alloc	)
+		__field(	unsigned long,	gfp_flags	)
+		__field(	int,		node		)
+	),
 
-	TP_PROTO(unsigned long call_site, const void *ptr,
-		 struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc,
-		 gfp_t gfp_flags, int node),
+	TP_fast_assign(
+		__entry->call_site	= call_site;
+		__entry->ptr		= ptr;
+		__entry->bytes_req	= bytes_req;
+		__entry->bytes_alloc	= bytes_alloc;
+		__entry->gfp_flags	= (__force unsigned long)gfp_flags;
+		__entry->node		= node;
+	),
 
-	TP_ARGS(call_site, ptr, s, bytes_req, bytes_alloc, gfp_flags, node)
+	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d accounted=%s",
+		(void *)__entry->call_site,
+		__entry->ptr,
+		__entry->bytes_req,
+		__entry->bytes_alloc,
+		show_gfp_flags(__entry->gfp_flags),
+		__entry->node,
+		(IS_ENABLED(CONFIG_MEMCG_KMEM) &&
+		 (__entry->gfp_flags & __GFP_ACCOUNT)) ? "true" : "false")
 );
 
 TRACE_EVENT(kfree,
@@ -93,20 +113,20 @@ TRACE_EVENT(kfree,
 
 TRACE_EVENT(kmem_cache_free,
 
-	TP_PROTO(unsigned long call_site, const void *ptr, const char *name),
+	TP_PROTO(unsigned long call_site, const void *ptr, const struct kmem_cache *s),
 
-	TP_ARGS(call_site, ptr, name),
+	TP_ARGS(call_site, ptr, s),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	call_site	)
 		__field(	const void *,	ptr		)
-		__string(	name,	name	)
+		__string(	name,		s->name		)
 	),
 
 	TP_fast_assign(
 		__entry->call_site	= call_site;
 		__entry->ptr		= ptr;
-		__assign_str(name, name);
+		__assign_str(name, s->name);
 	),
 
 	TP_printk("call_site=%pS ptr=%p name=%s",
diff --git a/mm/slab.c b/mm/slab.c
index 2fd400203ac2..a5486ff8362a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3440,8 +3440,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
 {
 	void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
-			       cachep->size, flags, NUMA_NO_NODE);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3536,8 +3535,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
 	void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, cachep->object_size,
-			       cachep->size, flags, nodeid);
+	trace_kmem_cache_alloc(_RET_IP_, ret, cachep, flags, nodeid);
 
 	return ret;
 }
@@ -3607,7 +3605,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
 	if (!cachep)
 		return;
 
-	trace_kmem_cache_free(_RET_IP_, objp, cachep->name);
+	trace_kmem_cache_free(_RET_IP_, objp, cachep);
 	__do_kmem_cache_free(cachep, objp, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f34be57b00c8..e53016c9a6e9 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -907,7 +907,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
 		ret = __kmalloc_large_node(size, flags, node);
-		trace_kmalloc(_RET_IP_, ret, NULL, size,
+		trace_kmalloc(_RET_IP_, ret, size,
 			      PAGE_SIZE << get_order(size), flags, node);
 		return ret;
 	}
@@ -919,7 +919,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
 
 	ret = __kmem_cache_alloc_node(s, flags, node, size, caller);
 	ret = kasan_kmalloc(s, ret, size, flags);
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, flags, node);
+	trace_kmalloc(_RET_IP_, ret, size, s->size, flags, node);
 	return ret;
 }
 
@@ -1005,7 +1005,7 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 	void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
 					    size, _RET_IP_);
 
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -1017,7 +1017,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
 {
 	void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
 
-	trace_kmalloc(_RET_IP_, ret, s, size, s->size, gfpflags, node);
+	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node);
 
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -1072,8 +1072,8 @@ void *kmalloc_large(size_t size, gfp_t flags)
 {
 	void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
 
-	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags, NUMA_NO_NODE);
+	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
+		      flags, NUMA_NO_NODE);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large);
@@ -1082,8 +1082,8 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	void *ret = __kmalloc_large_node(size, flags, node);
 
-	trace_kmalloc(_RET_IP_, ret, NULL, size,
-		      PAGE_SIZE << get_order(size), flags, node);
+	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size),
+		      flags, node);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_large_node);
diff --git a/mm/slob.c b/mm/slob.c
index 3208c56d8f82..771af84576bf 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -507,8 +507,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 		*m = size;
 		ret = (void *)m + minalign;
 
-		trace_kmalloc(caller, ret, NULL, size,
-			      size + minalign, gfp, node);
+		trace_kmalloc(caller, ret, size, size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -516,8 +515,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 			gfp |= __GFP_COMP;
 		ret = slob_new_pages(gfp, order, node);
 
-		trace_kmalloc(caller, ret, NULL, size,
-			      PAGE_SIZE << order, gfp, node);
+		trace_kmalloc(caller, ret, size, PAGE_SIZE << order, gfp, node);
 	}
 
 	kmemleak_alloc(ret, size, 1, gfp);
@@ -594,6 +592,9 @@ int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
 		/* leave room for rcu footer at the end of object */
 		c->size += sizeof(struct slob_rcu);
 	}
+
+	/* Actual size allocated */
+	c->size = SLOB_UNITS(c->size) * SLOB_UNIT;
 	c->flags = flags;
 	return 0;
 }
@@ -608,14 +609,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 
 	if (c->size < PAGE_SIZE) {
 		b = slob_alloc(c->size, flags, c->align, node, 0);
-		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
-				       SLOB_UNITS(c->size) * SLOB_UNIT,
-				       flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
 	} else {
 		b = slob_new_pages(flags, get_order(c->size), node);
-		trace_kmem_cache_alloc(_RET_IP_, b, NULL, c->object_size,
-				       PAGE_SIZE << get_order(c->size),
-				       flags, node);
+		trace_kmem_cache_alloc(_RET_IP_, b, c, flags, node);
 	}
 
 	if (b && c->ctor) {
@@ -671,7 +668,7 @@ static void kmem_rcu_free(struct rcu_head *head)
 void kmem_cache_free(struct kmem_cache *c, void *b)
 {
 	kmemleak_free_recursive(b, c->flags);
-	trace_kmem_cache_free(_RET_IP_, b, c->name);
+	trace_kmem_cache_free(_RET_IP_, b, c);
 	if (unlikely(c->flags & SLAB_TYPESAFE_BY_RCU)) {
 		struct slob_rcu *slob_rcu;
 		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
diff --git a/mm/slub.c b/mm/slub.c
index 22e4ccf06638..8083a6ee5f15 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3243,8 +3243,7 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 {
 	void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
-				s->size, gfpflags, NUMA_NO_NODE);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, NUMA_NO_NODE);
 
 	return ret;
 }
@@ -3274,8 +3273,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
 	void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
 
-	trace_kmem_cache_alloc(_RET_IP_, ret, s, s->object_size,
-			       s->size, gfpflags, node);
+	trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node);
 
 	return ret;
 }
@@ -3517,7 +3515,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
 	s = cache_from_obj(s, x);
 	if (!s)
 		return;
-	trace_kmem_cache_free(_RET_IP_, x, s->name);
+	trace_kmem_cache_free(_RET_IP_, x, s);
 	slab_free(s, virt_to_slab(x), x, NULL, &x, 1, _RET_IP_);
 }
 EXPORT_SYMBOL(kmem_cache_free);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 16/17] mm/slab_common: move declaration of __ksize() to mm/slab.h
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (14 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 15/17] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-17 10:18 ` [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
  2022-08-23 15:16 ` [PATCH v4 00/17] common kmalloc v4 Vlastimil Babka
  17 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel

__ksize() is only called by KASAN. Remove export symbol and move
declaration to mm/slab.h as we don't want to grow its callers.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/slab.h |  1 -
 mm/slab.h            |  2 ++
 mm/slab_common.c     | 11 +----------
 mm/slob.c            |  1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index c8e485ce8815..9b592e611cb1 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s);
 void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
 void kfree(const void *objp);
 void kfree_sensitive(const void *objp);
-size_t __ksize(const void *objp);
 size_t ksize(const void *objp);
 #ifdef CONFIG_PRINTK
 bool kmem_valid_obj(void *object);
diff --git a/mm/slab.h b/mm/slab.h
index 4d8330d57573..65023f000d42 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -668,6 +668,8 @@ void free_large_kmalloc(struct folio *folio, void *object);
 
 #endif /* CONFIG_SLOB */
 
+size_t __ksize(const void *objp);
+
 static inline size_t slab_ksize(const struct kmem_cache *s)
 {
 #ifndef CONFIG_SLUB
diff --git a/mm/slab_common.c b/mm/slab_common.c
index e53016c9a6e9..9c273a5fb0d7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -974,15 +974,7 @@ void kfree(const void *object)
 }
 EXPORT_SYMBOL(kfree);
 
-/**
- * __ksize -- Uninstrumented ksize.
- * @objp: pointer to the object
- *
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
+/* Uninstrumented ksize. Only called by KASAN. */
 size_t __ksize(const void *object)
 {
 	struct folio *folio;
@@ -997,7 +989,6 @@ size_t __ksize(const void *object)
 
 	return slab_ksize(folio_slab(folio)->slab_cache);
 }
-EXPORT_SYMBOL(__ksize);
 
 #ifdef CONFIG_TRACING
 void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
diff --git a/mm/slob.c b/mm/slob.c
index 771af84576bf..45a061b8ba38 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -584,7 +584,6 @@ size_t __ksize(const void *block)
 	m = (unsigned int *)(block - align);
 	return SLOB_UNITS(*m) * SLOB_UNIT;
 }
-EXPORT_SYMBOL(__ksize);
 
 int __kmem_cache_create(struct kmem_cache *c, slab_flags_t flags)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize()
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (15 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 16/17] mm/slab_common: move declaration of __ksize() to mm/slab.h Hyeonggon Yoo
@ 2022-08-17 10:18 ` Hyeonggon Yoo
  2022-08-23 15:12   ` Vlastimil Babka
  2022-08-23 15:16 ` [PATCH v4 00/17] common kmalloc v4 Vlastimil Babka
  17 siblings, 1 reply; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-17 10:18 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin
  Cc: Hyeonggon Yoo, linux-mm, linux-kernel, Marco Elver

If address of large object is not beginning of folio or size of
the folio is too small, it must be invalid. BUG() in such cases.

Cc: Marco Elver <elver@google.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c273a5fb0d7..98d029212682 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -984,8 +984,11 @@ size_t __ksize(const void *object)
 
 	folio = virt_to_folio(object);
 
-	if (unlikely(!folio_test_slab(folio)))
+	if (unlikely(!folio_test_slab(folio))) {
+		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
+		BUG_ON(object != folio_address(folio));
 		return folio_size(folio);
+	}
 
 	return slab_ksize(folio_slab(folio)->slab_cache);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
  2022-08-17 10:18 ` [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Hyeonggon Yoo
@ 2022-08-23 15:04   ` Vlastimil Babka
  2022-08-24  3:54     ` Hyeonggon Yoo
  0 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2022-08-23 15:04 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin
  Cc: linux-mm, linux-kernel

On 8/17/22 12:18, Hyeonggon Yoo wrote:
> This patch does:

I've removed this line locally and re-idented the rest.

> 	- Despite its name, kmem_cache_alloc[_node]_trace() is hook for
> 	  inlined kmalloc. So rename it to kmalloc[_node]_trace().
> 
> 	- Move its implementation to slab_common.c by using
>           __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to
> 	  save a function call when CONFIG_TRACING=n.
> 
> 	- Use __assume_kmalloc_alignment for kmalloc[_node]_trace
> 	  instead of __assume_slab_alignement. Generally kmalloc has
> 	  larger alignment requirements.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize()
  2022-08-17 10:18 ` [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
@ 2022-08-23 15:12   ` Vlastimil Babka
  2022-08-24  3:52     ` Hyeonggon Yoo
  0 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2022-08-23 15:12 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin
  Cc: linux-mm, linux-kernel, Marco Elver

On 8/17/22 12:18, Hyeonggon Yoo wrote:
> If address of large object is not beginning of folio or size of
> the folio is too small, it must be invalid. BUG() in such cases.
> 
> Cc: Marco Elver <elver@google.com>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slab_common.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c273a5fb0d7..98d029212682 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -984,8 +984,11 @@ size_t __ksize(const void *object)
>  
>  	folio = virt_to_folio(object);
>  
> -	if (unlikely(!folio_test_slab(folio)))
> +	if (unlikely(!folio_test_slab(folio))) {
> +		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
> +		BUG_ON(object != folio_address(folio));
>  		return folio_size(folio);
> +	}
>  
>  	return slab_ksize(folio_slab(folio)->slab_cache);
>  }

In light of latest Linus' rant on BUG_ON() [1] I'm changing it to WARN_ON
and return 0, as it was in v3.

[1] https://lore.kernel.org/all/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com/


diff --git a/mm/slab_common.c b/mm/slab_common.c
index 98d029212682..a80c3a5e194d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -985,8 +985,10 @@ size_t __ksize(const void *object)
 	folio = virt_to_folio(object);
 
 	if (unlikely(!folio_test_slab(folio))) {
-		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
-		BUG_ON(object != folio_address(folio));
+		if (WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
+			return 0;
+		if (WARN_ON(object != folio_address(folio)))
+			return 0;
 		return folio_size(folio);
 	}
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 00/17] common kmalloc v4
  2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
                   ` (16 preceding siblings ...)
  2022-08-17 10:18 ` [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
@ 2022-08-23 15:16 ` Vlastimil Babka
  2022-08-24  3:58   ` Hyeonggon Yoo
  17 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2022-08-23 15:16 UTC (permalink / raw)
  To: Hyeonggon Yoo, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin
  Cc: linux-mm, linux-kernel

On 8/17/22 12:18, Hyeonggon Yoo wrote:
> v3: https://lore.kernel.org/lkml/20220712133946.307181-1-42.hyeyoo@gmail.com/
> 
> Hello, this is common kmalloc v4.
> Please review and consider applying.

Thanks, added to slab.git for-6.1/common_kmalloc and merged to for-next!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize()
  2022-08-23 15:12   ` Vlastimil Babka
@ 2022-08-24  3:52     ` Hyeonggon Yoo
  0 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-24  3:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, linux-mm, linux-kernel,
	Marco Elver

On Tue, Aug 23, 2022 at 05:12:01PM +0200, Vlastimil Babka wrote:
> On 8/17/22 12:18, Hyeonggon Yoo wrote:
> > If address of large object is not beginning of folio or size of
> > the folio is too small, it must be invalid. BUG() in such cases.
> > 
> > Cc: Marco Elver <elver@google.com>
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> >  mm/slab_common.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 9c273a5fb0d7..98d029212682 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -984,8 +984,11 @@ size_t __ksize(const void *object)
> >  
> >  	folio = virt_to_folio(object);
> >  
> > -	if (unlikely(!folio_test_slab(folio)))
> > +	if (unlikely(!folio_test_slab(folio))) {
> > +		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
> > +		BUG_ON(object != folio_address(folio));
> >  		return folio_size(folio);
> > +	}
> >  
> >  	return slab_ksize(folio_slab(folio)->slab_cache);
> >  }
> 
> In light of latest Linus' rant on BUG_ON() [1] I'm changing it to WARN_ON
> and return 0, as it was in v3.
> 
> [1] https://lore.kernel.org/all/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@mail.gmail.com/

Okay. I'm fine with that.

> 
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 98d029212682..a80c3a5e194d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -985,8 +985,10 @@ size_t __ksize(const void *object)
>  	folio = virt_to_folio(object);
>  
>  	if (unlikely(!folio_test_slab(folio))) {
> -		BUG_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE);
> -		BUG_ON(object != folio_address(folio));
> +		if (WARN_ON(folio_size(folio) <= KMALLOC_MAX_CACHE_SIZE))
> +			return 0;
> +		if (WARN_ON(object != folio_address(folio)))
> +			return 0;
>  		return folio_size(folio);
>  	}
>  
> 

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
  2022-08-23 15:04   ` Vlastimil Babka
@ 2022-08-24  3:54     ` Hyeonggon Yoo
  0 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-24  3:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, linux-mm, linux-kernel

On Tue, Aug 23, 2022 at 05:04:36PM +0200, Vlastimil Babka wrote:
> On 8/17/22 12:18, Hyeonggon Yoo wrote:
> > This patch does:
> 
> I've removed this line locally and re-idented the rest.

Ah, thanks. looks better.

> > 	- Despite its name, kmem_cache_alloc[_node]_trace() is hook for
> > 	  inlined kmalloc. So rename it to kmalloc[_node]_trace().
> > 
> > 	- Move its implementation to slab_common.c by using
> >           __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to
> > 	  save a function call when CONFIG_TRACING=n.
> > 
> > 	- Use __assume_kmalloc_alignment for kmalloc[_node]_trace
> > 	  instead of __assume_slab_alignement. Generally kmalloc has
> > 	  larger alignment requirements.
> > 
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 00/17] common kmalloc v4
  2022-08-23 15:16 ` [PATCH v4 00/17] common kmalloc v4 Vlastimil Babka
@ 2022-08-24  3:58   ` Hyeonggon Yoo
  0 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-08-24  3:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, linux-mm, linux-kernel

On Tue, Aug 23, 2022 at 05:16:17PM +0200, Vlastimil Babka wrote:
> On 8/17/22 12:18, Hyeonggon Yoo wrote:
> > v3: https://lore.kernel.org/lkml/20220712133946.307181-1-42.hyeyoo@gmail.com/
> > 
> > Hello, this is common kmalloc v4.
> > Please review and consider applying.
> 
> Thanks, added to slab.git for-6.1/common_kmalloc and merged to for-next!

Thanks!

But please see these:
	https://lore.kernel.org/linux-mm/YwWfr8ATVx2Ag94z@hyeyoo/
	https://lore.kernel.org/lkml/20220824134530.1b10e768@canb.auug.org.au/T/#u

Fixed those, So please pull this:
	https://github.com/hygoni/linux.git slab-common-v4r1

git range-diff	for-6.1/common_kmalloc~17...for-6.1/common_kmalloc \
		slab-common-v4r1~17...slab-common-v4r1:

 1:  0276f0da97e3 =  1:  0276f0da97e3 mm/slab: move NUMA-related code to __do_cache_alloc()
 2:  d5ea00e8d8c9 =  2:  d5ea00e8d8c9 mm/slab: cleanup slab_alloc() and slab_alloc_node()
 3:  48c55c42e6b8 =  3:  48c55c42e6b8 mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions
 4:  cd8523b488ec =  4:  cd8523b488ec mm/slab_common: cleanup kmalloc_track_caller()
 5:  0b92d497e03a =  5:  0b92d497e03a mm/sl[au]b: factor out __do_kmalloc_node()
 6:  d43649c0f472 =  6:  d43649c0f472 mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()
 7:  cd6d756d6118 =  7:  cd6d756d6118 mm/slub: move kmalloc_large_node() to slab_common.c
 8:  fe8f3819416e !  8:  ec277200c5dd mm/slab_common: kmalloc_node: pass large requests to page allocator
    @@ mm/slab_common.c: void *kmalloc_large(size_t size, gfp_t flags)
      EXPORT_SYMBOL(kmalloc_large);

     -void *kmalloc_large_node(size_t size, gfp_t flags, int node)
    -+void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    ++static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      {
      	struct page *page;
      	void *ptr = NULL;
 9:  cc40615623ed !  9:  3d1d49576f4a mm/slab_common: cleanup kmalloc_large()
    @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
     -}
     -EXPORT_SYMBOL(kmalloc_large);

    - void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    + static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      {
    -@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    +@@ mm/slab_common.c: static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      	void *ptr = NULL;
      	unsigned int order = get_order(size);

    @@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int
      	flags |= __GFP_COMP;
      	page = alloc_pages_node(node, flags, order);
      	if (page) {
    -@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    +@@ mm/slab_common.c: static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
      	return ptr;
      }

10:  e14d748cf9ad = 10:  d6d55b2e745a mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
11:  84000279b448 = 11:  28c1aabc9f73 mm/sl[au]b: introduce common alloc/free functions without tracepoint
12:  79c7527b9805 ! 12:  7fefa4235ba9 mm/sl[au]b: generalize kmalloc subsystem
    @@ mm/slab_common.c: void free_large_kmalloc(struct folio *folio, void *object)
     +
     +/**
     + * kfree - free previously allocated memory
    -+ * @objp: pointer returned by kmalloc.
    ++ * @object: pointer returned by kmalloc.
     + *
    -+ * If @objp is NULL, no operation is performed.
    ++ * If @object is NULL, no operation is performed.
     + *
     + * Don't free memory not originally allocated by kmalloc()
     + * or you will run into trouble.
    @@ mm/slab_common.c: void free_large_kmalloc(struct folio *folio, void *object)
     +
     +/**
     + * __ksize -- Uninstrumented ksize.
    -+ * @objp: pointer to the object
    ++ * @object: pointer to the object
     + *
     + * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
     + * safety checks as ksize() with KASAN instrumentation enabled.
     + *
    -+ * Return: size of the actual memory used by @objp in bytes
    ++ * Return: size of the actual memory used by @object in bytes
     + */
     +size_t __ksize(const void *object)
     +{
    @@ mm/slab_common.c: gfp_t kmalloc_fix_flags(gfp_t flags)
       * know the allocation order to free the pages properly in kfree.
       */

    --void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    -+void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
    +-static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    ++static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
      {
      	struct page *page;
      	void *ptr = NULL;
    -@@ mm/slab_common.c: void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)
    +@@ mm/slab_common.c: static void *kmalloc_large_node_notrace(size_t size, gfp_t flags, int node)

      void *kmalloc_large(size_t size, gfp_t flags)
      {
13:  31be83f97c43 = 13:  446064fdf403 mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
14:  583b9ef311da = 14:  c923544d6d61 mm/slab_common: unify NUMA and UMA version of tracepoints
15:  d0b3552d07e0 = 15:  72633319472e mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
16:  0db36c104255 ! 16:  c9b5ded32cc6 mm/slab_common: move declaration of __ksize() to mm/slab.h
    @@ mm/slab_common.c: void kfree(const void *object)

     -/**
     - * __ksize -- Uninstrumented ksize.
    -- * @objp: pointer to the object
    +- * @object: pointer to the object
     - *
     - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
     - * safety checks as ksize() with KASAN instrumentation enabled.
     - *
    -- * Return: size of the actual memory used by @objp in bytes
    +- * Return: size of the actual memory used by @object in bytes
     - */
     +/* Uninstrumented ksize. Only called by KASAN. */
      size_t __ksize(const void *object)
17:  b261334803b4 = 17:  0248c8a1af52 mm/sl[au]b: check if large object is valid in __ksize()

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-08-17 10:18 ` [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
@ 2022-10-14 20:58   ` Guenter Roeck
  2022-10-14 23:48     ` Hyeonggon Yoo
  2022-10-15  4:34     ` [PATCH] mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation Hyeonggon Yoo
  0 siblings, 2 replies; 29+ messages in thread
From: Guenter Roeck @ 2022-10-14 20:58 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm,
	linux-kernel

Hi,

On Wed, Aug 17, 2022 at 07:18:19PM +0900, Hyeonggon Yoo wrote:
> There is not much benefit for serving large objects in kmalloc().
> Let's pass large requests to page allocator like SLUB for better
> maintenance of common code.
> 
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---

This patch results in a WARNING backtrace in all mips and sparc64
emulations.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/slab_common.c:729 kmalloc_slab+0xc0/0xdc
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-11990-g9c9155a3509a #1
Stack : ffffffff 801b2a18 80dd0000 00000004 00000000 00000000 81023cd4 00000000
        81040000 811a9930 81040000 8104a628 81101833 00000001 81023c78 00000000
        00000000 00000000 80f5d858 81023b98 00000001 00000023 00000000 ffffffff
        00000000 00000064 00000002 81040000 81040000 00000001 80f5d858 000002d9
        00000000 00000000 80000000 80002000 00000000 00000000 00000000 00000000
        ...
Call Trace:
[<8010a2bc>] show_stack+0x38/0x118
[<80cf5f7c>] dump_stack_lvl+0xac/0x104
[<80130d7c>] __warn+0xe0/0x224
[<80cdba5c>] warn_slowpath_fmt+0x64/0xb8
[<8028c058>] kmalloc_slab+0xc0/0xdc

irq event stamp: 0
hardirqs last  enabled at (0): [<00000000>] 0x0
hardirqs last disabled at (0): [<00000000>] 0x0
softirqs last  enabled at (0): [<00000000>] 0x0
softirqs last disabled at (0): [<00000000>] 0x0
---[ end trace 0000000000000000 ]---

Guenter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-10-14 20:58   ` Guenter Roeck
@ 2022-10-14 23:48     ` Hyeonggon Yoo
  2022-10-15 19:39       ` Vlastimil Babka
  2022-10-15  4:34     ` [PATCH] mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation Hyeonggon Yoo
  1 sibling, 1 reply; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-10-14 23:48 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm,
	linux-kernel

On Fri, Oct 14, 2022 at 01:58:18PM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Wed, Aug 17, 2022 at 07:18:19PM +0900, Hyeonggon Yoo wrote:
> > There is not much benefit for serving large objects in kmalloc().
> > Let's pass large requests to page allocator like SLUB for better
> > maintenance of common code.
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> 
> This patch results in a WARNING backtrace in all mips and sparc64
> emulations.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at mm/slab_common.c:729 kmalloc_slab+0xc0/0xdc
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-11990-g9c9155a3509a #1
> Stack : ffffffff 801b2a18 80dd0000 00000004 00000000 00000000 81023cd4 00000000
>         81040000 811a9930 81040000 8104a628 81101833 00000001 81023c78 00000000
>         00000000 00000000 80f5d858 81023b98 00000001 00000023 00000000 ffffffff
>         00000000 00000064 00000002 81040000 81040000 00000001 80f5d858 000002d9
>         00000000 00000000 80000000 80002000 00000000 00000000 00000000 00000000
>         ...
> Call Trace:
> [<8010a2bc>] show_stack+0x38/0x118
> [<80cf5f7c>] dump_stack_lvl+0xac/0x104
> [<80130d7c>] __warn+0xe0/0x224
> [<80cdba5c>] warn_slowpath_fmt+0x64/0xb8
> [<8028c058>] kmalloc_slab+0xc0/0xdc
> 
> irq event stamp: 0
> hardirqs last  enabled at (0): [<00000000>] 0x0
> hardirqs last disabled at (0): [<00000000>] 0x0
> softirqs last  enabled at (0): [<00000000>] 0x0
> softirqs last disabled at (0): [<00000000>] 0x0
> ---[ end trace 0000000000000000 ]---
> 
> Guenter

Hi.

Thank you so much for this report!

Hmm so SLAB tries to find kmalloc cache for freelist index array using
kmalloc_slab() directly, and it becomes problematic when size of the
array is larger than PAGE_SIZE * 2.

Will send a fix soon.

-- 
Thanks,
Hyeonggon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
  2022-10-14 20:58   ` Guenter Roeck
  2022-10-14 23:48     ` Hyeonggon Yoo
@ 2022-10-15  4:34     ` Hyeonggon Yoo
  1 sibling, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-10-15  4:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Roman Gushchin, linux-mm,
	linux-kernel

After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.

SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.

Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---

@Guenter:
	This fixes the issue on my emulation.
	Can you please test this on your environment?

 include/linux/slab_def.h |  1 -
 mm/slab.c                | 37 +++++++++++++++++++------------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index e24c9aff6fed..f0ffad6a3365 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -33,7 +33,6 @@ struct kmem_cache {
 
 	size_t colour;			/* cache colouring range */
 	unsigned int colour_off;	/* colour offset */
-	struct kmem_cache *freelist_cache;
 	unsigned int freelist_size;
 
 	/* constructor func */
diff --git a/mm/slab.c b/mm/slab.c
index a5486ff8362a..d1f6e2c64c2e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1619,7 +1619,7 @@ static void slab_destroy(struct kmem_cache *cachep, struct slab *slab)
 	 * although actual page can be freed in rcu context
 	 */
 	if (OFF_SLAB(cachep))
-		kmem_cache_free(cachep->freelist_cache, freelist);
+		kfree(freelist);
 }
 
 /*
@@ -1671,21 +1671,27 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 		if (flags & CFLGS_OFF_SLAB) {
 			struct kmem_cache *freelist_cache;
 			size_t freelist_size;
+			size_t freelist_cache_size;
 
 			freelist_size = num * sizeof(freelist_idx_t);
-			freelist_cache = kmalloc_slab(freelist_size, 0u);
-			if (!freelist_cache)
-				continue;
-
-			/*
-			 * Needed to avoid possible looping condition
-			 * in cache_grow_begin()
-			 */
-			if (OFF_SLAB(freelist_cache))
-				continue;
+			if (freelist_size > KMALLOC_MAX_CACHE_SIZE) {
+				freelist_cache_size = PAGE_SIZE << get_order(freelist_size);
+			} else {
+				freelist_cache = kmalloc_slab(freelist_size, 0u);
+				if (!freelist_cache)
+					continue;
+				freelist_cache_size = freelist_cache->size;
+
+				/*
+				 * Needed to avoid possible looping condition
+				 * in cache_grow_begin()
+				 */
+				if (OFF_SLAB(freelist_cache))
+					continue;
+			}
 
 			/* check if off slab has enough benefit */
-			if (freelist_cache->size > cachep->size / 2)
+			if (freelist_cache_size > cachep->size / 2)
 				continue;
 		}
 
@@ -2061,11 +2067,6 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
 		cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
 #endif
 
-	if (OFF_SLAB(cachep)) {
-		cachep->freelist_cache =
-			kmalloc_slab(cachep->freelist_size, 0u);
-	}
-
 	err = setup_cpu_cache(cachep, gfp);
 	if (err) {
 		__kmem_cache_release(cachep);
@@ -2292,7 +2293,7 @@ static void *alloc_slabmgmt(struct kmem_cache *cachep,
 		freelist = NULL;
 	else if (OFF_SLAB(cachep)) {
 		/* Slab management obj is off-slab. */
-		freelist = kmem_cache_alloc_node(cachep->freelist_cache,
+		freelist = kmalloc_node(cachep->freelist_size,
 					      local_flags, nodeid);
 	} else {
 		/* We will use last bytes at the slab for freelist */
-- 
2.32.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-10-14 23:48     ` Hyeonggon Yoo
@ 2022-10-15 19:39       ` Vlastimil Babka
  2022-10-16  9:10         ` Hyeonggon Yoo
  0 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2022-10-15 19:39 UTC (permalink / raw)
  To: Hyeonggon Yoo, Guenter Roeck
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, linux-mm, linux-kernel

On 10/15/22 01:48, Hyeonggon Yoo wrote:
> On Fri, Oct 14, 2022 at 01:58:18PM -0700, Guenter Roeck wrote:
>> Hi,
>> 
>> On Wed, Aug 17, 2022 at 07:18:19PM +0900, Hyeonggon Yoo wrote:
>> > There is not much benefit for serving large objects in kmalloc().
>> > Let's pass large requests to page allocator like SLUB for better
>> > maintenance of common code.
>> > 
>> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> > ---
>> 
>> This patch results in a WARNING backtrace in all mips and sparc64
>> emulations.
>> 
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 0 at mm/slab_common.c:729 kmalloc_slab+0xc0/0xdc
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-11990-g9c9155a3509a #1
>> Stack : ffffffff 801b2a18 80dd0000 00000004 00000000 00000000 81023cd4 00000000
>>         81040000 811a9930 81040000 8104a628 81101833 00000001 81023c78 00000000
>>         00000000 00000000 80f5d858 81023b98 00000001 00000023 00000000 ffffffff
>>         00000000 00000064 00000002 81040000 81040000 00000001 80f5d858 000002d9
>>         00000000 00000000 80000000 80002000 00000000 00000000 00000000 00000000
>>         ...
>> Call Trace:
>> [<8010a2bc>] show_stack+0x38/0x118
>> [<80cf5f7c>] dump_stack_lvl+0xac/0x104
>> [<80130d7c>] __warn+0xe0/0x224
>> [<80cdba5c>] warn_slowpath_fmt+0x64/0xb8
>> [<8028c058>] kmalloc_slab+0xc0/0xdc
>> 
>> irq event stamp: 0
>> hardirqs last  enabled at (0): [<00000000>] 0x0
>> hardirqs last disabled at (0): [<00000000>] 0x0
>> softirqs last  enabled at (0): [<00000000>] 0x0
>> softirqs last disabled at (0): [<00000000>] 0x0
>> ---[ end trace 0000000000000000 ]---
>> 
>> Guenter
> 
> Hi.
> 
> Thank you so much for this report!
> 
> Hmm so SLAB tries to find kmalloc cache for freelist index array using
> kmalloc_slab() directly, and it becomes problematic when size of the
> array is larger than PAGE_SIZE * 2.

Hmm interesting, did you find out how exactly that can happen in practice,
or what's special about mips and sparc64 here? Because normally
calculate_slab_order() will only go up to slab_max_order, which AFAICS can
only go up to SLAB_MAX_ORDER_HI, thus 1, unless there's a boot command line
override.

And if we have two pages for objects, surely even with small objects they
can't be smaller than freelist_idx_t, so if the number of objects fits into
two pages (order 1), then the freelist array should also fit in two pages?

Thanks,
Vlastimil

> Will send a fix soon.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
  2022-10-15 19:39       ` Vlastimil Babka
@ 2022-10-16  9:10         ` Hyeonggon Yoo
  0 siblings, 0 replies; 29+ messages in thread
From: Hyeonggon Yoo @ 2022-10-16  9:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Guenter Roeck, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, linux-mm,
	linux-kernel

On Sat, Oct 15, 2022 at 09:39:08PM +0200, Vlastimil Babka wrote:
> On 10/15/22 01:48, Hyeonggon Yoo wrote:
> > On Fri, Oct 14, 2022 at 01:58:18PM -0700, Guenter Roeck wrote:
> >> Hi,
> >> 
> >> On Wed, Aug 17, 2022 at 07:18:19PM +0900, Hyeonggon Yoo wrote:
> >> > There is not much benefit for serving large objects in kmalloc().
> >> > Let's pass large requests to page allocator like SLUB for better
> >> > maintenance of common code.
> >> > 
> >> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> >> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> > ---
> >> 
> >> This patch results in a WARNING backtrace in all mips and sparc64
> >> emulations.
> >> 
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 0 at mm/slab_common.c:729 kmalloc_slab+0xc0/0xdc
> >> Modules linked in:
> >> CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-11990-g9c9155a3509a #1
> >> Stack : ffffffff 801b2a18 80dd0000 00000004 00000000 00000000 81023cd4 00000000
> >>         81040000 811a9930 81040000 8104a628 81101833 00000001 81023c78 00000000
> >>         00000000 00000000 80f5d858 81023b98 00000001 00000023 00000000 ffffffff
> >>         00000000 00000064 00000002 81040000 81040000 00000001 80f5d858 000002d9
> >>         00000000 00000000 80000000 80002000 00000000 00000000 00000000 00000000
> >>         ...
> >> Call Trace:
> >> [<8010a2bc>] show_stack+0x38/0x118
> >> [<80cf5f7c>] dump_stack_lvl+0xac/0x104
> >> [<80130d7c>] __warn+0xe0/0x224
> >> [<80cdba5c>] warn_slowpath_fmt+0x64/0xb8
> >> [<8028c058>] kmalloc_slab+0xc0/0xdc
> >> 
> >> irq event stamp: 0
> >> hardirqs last  enabled at (0): [<00000000>] 0x0
> >> hardirqs last disabled at (0): [<00000000>] 0x0
> >> softirqs last  enabled at (0): [<00000000>] 0x0
> >> softirqs last disabled at (0): [<00000000>] 0x0
> >> ---[ end trace 0000000000000000 ]---
> >> 
> >> Guenter
> > 
> > Hi.
> > 
> > Thank you so much for this report!
> > 
> > Hmm so SLAB tries to find kmalloc cache for freelist index array using
> > kmalloc_slab() directly, and it becomes problematic when size of the
> > array is larger than PAGE_SIZE * 2.
> 
> Hmm interesting, did you find out how exactly that can happen in practice,

> or what's special about mips and sparc64 here?

IIUC if page size is large, number of objects per slab is quite large and so
the possiblity of failing to use objfreelist slab is higher, and then it
tries to use off slab.

> Because normally
> calculate_slab_order() will only go up to slab_max_order, which AFAICS can
> only go up to SLAB_MAX_ORDER_HI, thus 1, unless there's a boot command line
> override.

AFAICS with mips default configuration and without setting slab_max_order,
It seems SLAB actually does not use too big freelist index array.

But it hits the warning because of tricky logic.

For example if the condition is true on


>	if (freelist_cache->size > cachep->size / 2)
>		continue;

or on (before kmalloc is up, in case of kmem_cache)
>	freelist_cache = kmalloc_slab(freelist_size, 0u);
>       if (!freelist_cache)
>		continue;

it increases gfporder over and over until 'num' becomes larger than SLAB_MAX_OBJS.
(regardless of slab_max_order).

I think adding below would be more robust.

diff --git a/mm/slab.c b/mm/slab.c
index d1f6e2c64c2e..1321aca1887c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1679,7 +1679,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 			} else {
 				freelist_cache = kmalloc_slab(freelist_size, 0u);
 				if (!freelist_cache)
-					continue;
+					break;
 				freelist_cache_size = freelist_cache->size;
 
 				/*
@@ -1692,7 +1692,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 
 			/* check if off slab has enough benefit */
 			if (freelist_cache_size > cachep->size / 2)
-				continue;
+				break;
 		}
 
 		/* Found something acceptable - save it away */


> And if we have two pages for objects, surely even with small objects they
> can't be smaller than freelist_idx_t, so if the number of objects fits into
> two pages (order 1), then the freelist array should also fit in two pages?

That's right but on certain condition it seem to go larger than slab_max_order.
(from code inspection)

> 
> Thanks,
> Vlastimil
> 
> > Will send a fix soon.
> > 

-- 
Thanks,
Hyeonggon

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-10-16  9:10 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17 10:18 [PATCH v4 00/17] common kmalloc v4 Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 01/17] mm/slab: move NUMA-related code to __do_cache_alloc() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 02/17] mm/slab: cleanup slab_alloc() and slab_alloc_node() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 03/17] mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 04/17] mm/slab_common: cleanup kmalloc_track_caller() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 05/17] mm/sl[au]b: factor out __do_kmalloc_node() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 06/17] mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 07/17] mm/slub: move kmalloc_large_node() to slab_common.c Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 08/17] mm/slab_common: kmalloc_node: pass large requests to page allocator Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 09/17] mm/slab_common: cleanup kmalloc_large() Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 10/17] mm/slab: kmalloc: pass requests larger than order-1 page to page allocator Hyeonggon Yoo
2022-10-14 20:58   ` Guenter Roeck
2022-10-14 23:48     ` Hyeonggon Yoo
2022-10-15 19:39       ` Vlastimil Babka
2022-10-16  9:10         ` Hyeonggon Yoo
2022-10-15  4:34     ` [PATCH] mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 11/17] mm/sl[au]b: introduce common alloc/free functions without tracepoint Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 12/17] mm/sl[au]b: generalize kmalloc subsystem Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 13/17] mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Hyeonggon Yoo
2022-08-23 15:04   ` Vlastimil Babka
2022-08-24  3:54     ` Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 14/17] mm/slab_common: unify NUMA and UMA version of tracepoints Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 15/17] mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 16/17] mm/slab_common: move declaration of __ksize() to mm/slab.h Hyeonggon Yoo
2022-08-17 10:18 ` [PATCH v4 17/17] mm/sl[au]b: check if large object is valid in __ksize() Hyeonggon Yoo
2022-08-23 15:12   ` Vlastimil Babka
2022-08-24  3:52     ` Hyeonggon Yoo
2022-08-23 15:16 ` [PATCH v4 00/17] common kmalloc v4 Vlastimil Babka
2022-08-24  3:58   ` Hyeonggon Yoo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.