All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
@ 2023-11-27 19:41 Waiman Long
  2023-11-28 16:04 ` Catalin Marinas
  0 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2023-11-27 19:41 UTC (permalink / raw)
  To: Catalin Marinas, Andrew Morton; +Cc: linux-mm, linux-kernel, Waiman Long

It was found that on systems with large number of CPUs, the following soft
lockup splat might sometimes happen:

[ 2656.001617] watchdog: BUG: soft lockup - CPU#364 stuck for 21s! [ksoftirqd/364:2206]
  :
[ 2656.141194] RIP: 0010:_raw_spin_unlock_irqrestore+0x3d/0x70
  :
 2656.241214] Call Trace:
[ 2656.243971]  <IRQ>
[ 2656.246237]  ? show_trace_log_lvl+0x1c4/0x2df
[ 2656.251152]  ? show_trace_log_lvl+0x1c4/0x2df
[ 2656.256066]  ? kmemleak_free_percpu+0x11f/0x1f0
[ 2656.261173]  ? watchdog_timer_fn+0x379/0x470
[ 2656.265984]  ? __pfx_watchdog_timer_fn+0x10/0x10
[ 2656.271179]  ? __hrtimer_run_queues+0x5f3/0xd00
[ 2656.276283]  ? __pfx___hrtimer_run_queues+0x10/0x10
[ 2656.281783]  ? ktime_get_update_offsets_now+0x95/0x2c0
[ 2656.287573]  ? ktime_get_update_offsets_now+0xdd/0x2c0
[ 2656.293380]  ? hrtimer_interrupt+0x2e9/0x780
[ 2656.298221]  ? __sysvec_apic_timer_interrupt+0x184/0x640
[ 2656.304211]  ? sysvec_apic_timer_interrupt+0x8e/0xc0
[ 2656.309807]  </IRQ>
[ 2656.312169]  <TASK>
[ 2656.326110]  kmemleak_free_percpu+0x11f/0x1f0
[ 2656.331015]  free_percpu.part.0+0x1b/0xe70
[ 2656.335635]  free_vfsmnt+0xb9/0x100
[ 2656.339567]  rcu_do_batch+0x3c8/0xe30
[ 2656.363693]  rcu_core+0x3de/0x5a0
[ 2656.367433]  __do_softirq+0x2d0/0x9a8
[ 2656.381119]  run_ksoftirqd+0x36/0x60
[ 2656.385145]  smpboot_thread_fn+0x556/0x910
[ 2656.394971]  kthread+0x2a4/0x350
[ 2656.402826]  ret_from_fork+0x29/0x50
[ 2656.406861]  </TASK>

Fix this by adding a cond_resched() call in the percpu freeing loop
and defer the freeing of percpu kmemleak objects to a workqueue if it
is being called from a non-task context.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 52 insertions(+), 7 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 1eacca03bedd..03385f4a8008 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -168,6 +168,14 @@ struct kmemleak_object {
 	char comm[TASK_COMM_LEN];	/* executable name */
 };
 
+/*
+ * A percpu address to be submitted to a workqueue for being freed.
+ */
+struct kmemleak_percpu_addr {
+	struct work_struct work;
+	const void __percpu *ptr;
+};
+
 /* flag representing the memory block allocation status */
 #define OBJECT_ALLOCATED	(1 << 0)
 /* flag set after the first reporting of an unreference object */
@@ -1120,23 +1128,60 @@ void __ref kmemleak_free_part(const void *ptr, size_t size)
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_part);
 
+static void __kmemleak_free_percpu(const void __percpu *ptr)
+{
+	unsigned int cpu;
+
+	for_each_possible_cpu(cpu) {
+		delete_object_full((unsigned long)per_cpu_ptr(ptr, cpu));
+		if (in_task())
+			cond_resched();
+	}
+}
+
+/*
+ * Work function for deferred freeing of kmemleak objects associated with
+ * a freed percpu memory block.
+ */
+static void kmemleak_free_percpu_workfn(struct work_struct *work)
+{
+	struct kmemleak_percpu_addr *addr;
+
+	addr = container_of(work, struct kmemleak_percpu_addr, work);
+	__kmemleak_free_percpu(addr->ptr);
+	kfree(addr);
+}
+
 /**
  * kmemleak_free_percpu - unregister a previously registered __percpu object
  * @ptr:	__percpu pointer to beginning of the object
  *
  * This function is called from the kernel percpu allocator when an object
- * (memory block) is freed (free_percpu).
+ * (memory block) is freed (free_percpu). Since this function is inherently
+ * slow especially on systems with a large number of CPUs, defer the actual
+ * removal of kmemleak objects associated with the percpu pointer to a
+ * workqueue if it is not in a task context.
  */
 void __ref kmemleak_free_percpu(const void __percpu *ptr)
 {
-	unsigned int cpu;
-
 	pr_debug("%s(0x%px)\n", __func__, ptr);
 
-	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
-		for_each_possible_cpu(cpu)
-			delete_object_full((unsigned long)per_cpu_ptr(ptr,
-								      cpu));
+	if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr))
+		return;
+
+	if (!in_task()) {
+		struct kmemleak_percpu_addr *addr;
+
+		addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
+		if (addr) {
+			INIT_WORK(&addr->work, kmemleak_free_percpu_workfn);
+			addr->ptr = ptr;
+			queue_work(system_long_wq, &addr->work);
+			return;
+		}
+		/* Fallback to do direct deletion */
+	}
+	__kmemleak_free_percpu(ptr);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_percpu);
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
  2023-11-27 19:41 [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu() Waiman Long
@ 2023-11-28 16:04 ` Catalin Marinas
  2023-11-29  2:35   ` Waiman Long
  2023-11-29 22:57   ` Waiman Long
  0 siblings, 2 replies; 5+ messages in thread
From: Catalin Marinas @ 2023-11-28 16:04 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Mon, Nov 27, 2023 at 02:41:53PM -0500, Waiman Long wrote:
>  /**
>   * kmemleak_free_percpu - unregister a previously registered __percpu object
>   * @ptr:	__percpu pointer to beginning of the object
>   *
>   * This function is called from the kernel percpu allocator when an object
> - * (memory block) is freed (free_percpu).
> + * (memory block) is freed (free_percpu). Since this function is inherently
> + * slow especially on systems with a large number of CPUs, defer the actual
> + * removal of kmemleak objects associated with the percpu pointer to a
> + * workqueue if it is not in a task context.
>   */
>  void __ref kmemleak_free_percpu(const void __percpu *ptr)
>  {
> -	unsigned int cpu;
> -
>  	pr_debug("%s(0x%px)\n", __func__, ptr);
>  
> -	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
> -		for_each_possible_cpu(cpu)
> -			delete_object_full((unsigned long)per_cpu_ptr(ptr,
> -								      cpu));
> +	if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr))
> +		return;
> +
> +	if (!in_task()) {
> +		struct kmemleak_percpu_addr *addr;
> +
> +		addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
> +		if (addr) {
> +			INIT_WORK(&addr->work, kmemleak_free_percpu_workfn);
> +			addr->ptr = ptr;
> +			queue_work(system_long_wq, &addr->work);
> +			return;
> +		}

We can't defer this freeing. It can mess up the kmemleak metadata if the
per-cpu pointer is re-allocated before kmemleak removed it from its
object tree.

The problem is looking up the object tree for each per-cpu offset. We
can make the percpu pointer handling O(1) since freeing is only done by
the main __percpu pointer, so that's the only one needing a look-up. So
far the per-cpu pointers are not tracked for leaking, only scanned.

We could just add the per_cpu_ptr(ptr, 0) to the kmemleak
object_tree_root but when scanning we don't have an inverse function to
get the __percpu pointer back and calculate the pointers for the other
CPUs (well, we could with some hacks but they are probably fragile).

What I came up with is a separate object_percpu_tree_root similar to the
object_phys_tree_root. The only reason for these additional trees is to
look up the kmemleak metadata when needed (usually freeing). They don't
contain objects that are tracked for actual leaking, only scanned. A
briefly tested patch below. I need to go through it again, update some
comments and write a commit log:

---------------------8<---------------------------------
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 1eacca03bedd..7446c9e0b8c8 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -178,6 +178,8 @@ struct kmemleak_object {
 #define OBJECT_FULL_SCAN	(1 << 3)
 /* flag set for object allocated with physical address */
 #define OBJECT_PHYS		(1 << 4)
+/* flag set for per-CPU pointers */
+#define OBJECT_PERCPU		(1 << 5)
 
 /* set when __remove_object() called */
 #define DELSTATE_REMOVED	(1 << 0)
@@ -206,6 +208,8 @@ static LIST_HEAD(mem_pool_free_list);
 static struct rb_root object_tree_root = RB_ROOT;
 /* search tree for object (with OBJECT_PHYS flag) boundaries */
 static struct rb_root object_phys_tree_root = RB_ROOT;
+/* search tree for object (with OBJECT_PERCPU flag) boundaries */
+static struct rb_root object_percpu_tree_root = RB_ROOT;
 /* protecting the access to object_list, object_tree_root (or object_phys_tree_root) */
 static DEFINE_RAW_SPINLOCK(kmemleak_lock);
 
@@ -298,7 +302,7 @@ static void hex_dump_object(struct seq_file *seq,
 	const u8 *ptr = (const u8 *)object->pointer;
 	size_t len;
 
-	if (WARN_ON_ONCE(object->flags & OBJECT_PHYS))
+	if (WARN_ON_ONCE(object->flags & (OBJECT_PHYS | OBJECT_PERCPU)))
 		return;
 
 	/* limit the number of lines to HEX_MAX_LINES */
@@ -392,6 +396,15 @@ static void dump_object_info(struct kmemleak_object *object)
 		stack_depot_print(object->trace_handle);
 }
 
+static struct rb_root *object_tree(unsigned long objflags)
+{
+	if (objflags & OBJECT_PHYS)
+		return &object_phys_tree_root;
+	if (objflags & OBJECT_PERCPU)
+		return &object_percpu_tree_root;
+	return &object_tree_root;
+}
+
 /*
  * Look-up a memory block metadata (kmemleak_object) in the object search
  * tree based on a pointer value. If alias is 0, only values pointing to the
@@ -399,10 +412,9 @@ static void dump_object_info(struct kmemleak_object *object)
  * when calling this function.
  */
 static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
-					       bool is_phys)
+					       unsigned int objflags)
 {
-	struct rb_node *rb = is_phys ? object_phys_tree_root.rb_node :
-			     object_tree_root.rb_node;
+	struct rb_node *rb = object_tree(objflags)->rb_node;
 	unsigned long untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
 
 	while (rb) {
@@ -431,7 +443,7 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
 /* Look-up a kmemleak object which allocated with virtual address. */
 static struct kmemleak_object *lookup_object(unsigned long ptr, int alias)
 {
-	return __lookup_object(ptr, alias, false);
+	return __lookup_object(ptr, alias, 0);
 }
 
 /*
@@ -544,14 +556,14 @@ static void put_object(struct kmemleak_object *object)
  * Look up an object in the object search tree and increase its use_count.
  */
 static struct kmemleak_object *__find_and_get_object(unsigned long ptr, int alias,
-						     bool is_phys)
+						     unsigned int objflags)
 {
 	unsigned long flags;
 	struct kmemleak_object *object;
 
 	rcu_read_lock();
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
-	object = __lookup_object(ptr, alias, is_phys);
+	object = __lookup_object(ptr, alias, objflags);
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
 	/* check whether the object is still available */
@@ -565,7 +577,7 @@ static struct kmemleak_object *__find_and_get_object(unsigned long ptr, int alia
 /* Look up and get an object which allocated with virtual address. */
 static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
 {
-	return __find_and_get_object(ptr, alias, false);
+	return __find_and_get_object(ptr, alias, 0);
 }
 
 /*
@@ -575,9 +587,7 @@ static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
  */
 static void __remove_object(struct kmemleak_object *object)
 {
-	rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
-				   &object_phys_tree_root :
-				   &object_tree_root);
+	rb_erase(&object->rb_node, object_tree(object->flags));
 	if (!(object->del_state & DELSTATE_NO_DELETE))
 		list_del_rcu(&object->object_list);
 	object->del_state |= DELSTATE_REMOVED;
@@ -585,11 +595,11 @@ static void __remove_object(struct kmemleak_object *object)
 
 static struct kmemleak_object *__find_and_remove_object(unsigned long ptr,
 							int alias,
-							bool is_phys)
+							unsigned int objflags)
 {
 	struct kmemleak_object *object;
 
-	object = __lookup_object(ptr, alias, is_phys);
+	object = __lookup_object(ptr, alias, objflags);
 	if (object)
 		__remove_object(object);
 
@@ -603,13 +613,13 @@ static struct kmemleak_object *__find_and_remove_object(unsigned long ptr,
  * by create_object().
  */
 static struct kmemleak_object *find_and_remove_object(unsigned long ptr, int alias,
-						      bool is_phys)
+						      unsigned int objflags)
 {
 	unsigned long flags;
 	struct kmemleak_object *object;
 
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
-	object = __find_and_remove_object(ptr, alias, is_phys);
+	object = __find_and_remove_object(ptr, alias, objflags);
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
 	return object;
@@ -648,7 +658,7 @@ static struct kmemleak_object *__alloc_object(gfp_t gfp)
 }
 
 static int __link_object(struct kmemleak_object *object, unsigned long ptr,
-			 size_t size, int min_count, bool is_phys)
+			 size_t size, int min_count, unsigned int objflags)
 {
 
 	struct kmemleak_object *parent;
@@ -661,7 +671,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 	INIT_HLIST_HEAD(&object->area_list);
 	raw_spin_lock_init(&object->lock);
 	atomic_set(&object->use_count, 1);
-	object->flags = OBJECT_ALLOCATED | (is_phys ? OBJECT_PHYS : 0);
+	object->flags = OBJECT_ALLOCATED | objflags;
 	object->pointer = ptr;
 	object->size = kfence_ksize((void *)ptr) ?: size;
 	object->excess_ref = 0;
@@ -697,12 +707,11 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 	 * Only update min_addr and max_addr with object
 	 * storing virtual address.
 	 */
-	if (!is_phys) {
+	if (!(objflags & (OBJECT_PHYS | OBJECT_PERCPU))) {
 		min_addr = min(min_addr, untagged_ptr);
 		max_addr = max(max_addr, untagged_ptr + size);
 	}
-	link = is_phys ? &object_phys_tree_root.rb_node :
-		&object_tree_root.rb_node;
+	link = &object_tree(objflags)->rb_node;
 	rb_parent = NULL;
 	while (*link) {
 		rb_parent = *link;
@@ -724,8 +733,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 		}
 	}
 	rb_link_node(&object->rb_node, rb_parent, link);
-	rb_insert_color(&object->rb_node, is_phys ? &object_phys_tree_root :
-					  &object_tree_root);
+	rb_insert_color(&object->rb_node, object_tree(objflags));
 	list_add_tail_rcu(&object->object_list, &object_list);
 
 	return 0;
@@ -737,7 +745,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
  * object_phys_tree_root).
  */
 static void __create_object(unsigned long ptr, size_t size,
-				int min_count, gfp_t gfp, bool is_phys)
+				int min_count, gfp_t gfp, unsigned int objflags)
 {
 	struct kmemleak_object *object;
 	unsigned long flags;
@@ -748,7 +756,7 @@ static void __create_object(unsigned long ptr, size_t size,
 		return;
 
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
-	ret = __link_object(object, ptr, size, min_count, is_phys);
+	ret = __link_object(object, ptr, size, min_count, objflags);
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 	if (ret)
 		mem_pool_free(object);
@@ -758,14 +766,21 @@ static void __create_object(unsigned long ptr, size_t size,
 static void create_object(unsigned long ptr, size_t size,
 			  int min_count, gfp_t gfp)
 {
-	__create_object(ptr, size, min_count, gfp, false);
+	__create_object(ptr, size, min_count, gfp, 0);
 }
 
 /* Create kmemleak object which allocated with physical address. */
 static void create_object_phys(unsigned long ptr, size_t size,
 			       int min_count, gfp_t gfp)
 {
-	__create_object(ptr, size, min_count, gfp, true);
+	__create_object(ptr, size, min_count, gfp, OBJECT_PHYS);
+}
+
+/* Create kmemleak object corresponding to a per-CPU allocation. */
+static void create_object_percpu(unsigned long ptr, size_t size,
+				 int min_count, gfp_t gfp)
+{
+	__create_object(ptr, size, min_count, gfp, OBJECT_PERCPU);
 }
 
 /*
@@ -792,11 +807,11 @@ static void __delete_object(struct kmemleak_object *object)
  * Look up the metadata (struct kmemleak_object) corresponding to ptr and
  * delete it.
  */
-static void delete_object_full(unsigned long ptr)
+static void delete_object_full(unsigned long ptr, unsigned int objflags)
 {
 	struct kmemleak_object *object;
 
-	object = find_and_remove_object(ptr, 0, false);
+	object = find_and_remove_object(ptr, 0, objflags);
 	if (!object) {
 #ifdef DEBUG
 		kmemleak_warn("Freeing unknown object at 0x%08lx\n",
@@ -812,7 +827,8 @@ static void delete_object_full(unsigned long ptr)
  * delete it. If the memory block is partially freed, the function may create
  * additional metadata for the remaining parts of the block.
  */
-static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
+static void delete_object_part(unsigned long ptr, size_t size,
+			       unsigned int objflags)
 {
 	struct kmemleak_object *object, *object_l, *object_r;
 	unsigned long start, end, flags;
@@ -826,7 +842,7 @@ static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
 		goto out;
 
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
-	object = __find_and_remove_object(ptr, 1, is_phys);
+	object = __find_and_remove_object(ptr, 1, objflags);
 	if (!object) {
 #ifdef DEBUG
 		kmemleak_warn("Partially freeing unknown object at 0x%08lx (size %zu)\n",
@@ -844,11 +860,11 @@ static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
 	end = object->pointer + object->size;
 	if ((ptr > start) &&
 	    !__link_object(object_l, start, ptr - start,
-			   object->min_count, is_phys))
+			   object->min_count, objflags))
 		object_l = NULL;
 	if ((ptr + size < end) &&
 	    !__link_object(object_r, ptr + size, end - ptr - size,
-			   object->min_count, is_phys))
+			   object->min_count, objflags))
 		object_r = NULL;
 
 unlock:
@@ -879,11 +895,11 @@ static void paint_it(struct kmemleak_object *object, int color)
 	raw_spin_unlock_irqrestore(&object->lock, flags);
 }
 
-static void paint_ptr(unsigned long ptr, int color, bool is_phys)
+static void paint_ptr(unsigned long ptr, int color, unsigned int objflags)
 {
 	struct kmemleak_object *object;
 
-	object = __find_and_get_object(ptr, 0, is_phys);
+	object = __find_and_get_object(ptr, 0, objflags);
 	if (!object) {
 		kmemleak_warn("Trying to color unknown object at 0x%08lx as %s\n",
 			      ptr,
@@ -901,16 +917,16 @@ static void paint_ptr(unsigned long ptr, int color, bool is_phys)
  */
 static void make_gray_object(unsigned long ptr)
 {
-	paint_ptr(ptr, KMEMLEAK_GREY, false);
+	paint_ptr(ptr, KMEMLEAK_GREY, 0);
 }
 
 /*
  * Mark the object as black-colored so that it is ignored from scans and
  * reporting.
  */
-static void make_black_object(unsigned long ptr, bool is_phys)
+static void make_black_object(unsigned long ptr, unsigned int objflags)
 {
-	paint_ptr(ptr, KMEMLEAK_BLACK, is_phys);
+	paint_ptr(ptr, KMEMLEAK_BLACK, objflags);
 }
 
 /*
@@ -1046,8 +1062,6 @@ EXPORT_SYMBOL_GPL(kmemleak_alloc);
 void __ref kmemleak_alloc_percpu(const void __percpu *ptr, size_t size,
 				 gfp_t gfp)
 {
-	unsigned int cpu;
-
 	pr_debug("%s(0x%px, %zu)\n", __func__, ptr, size);
 
 	/*
@@ -1055,9 +1069,7 @@ void __ref kmemleak_alloc_percpu(const void __percpu *ptr, size_t size,
 	 * (min_count is set to 0).
 	 */
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
-		for_each_possible_cpu(cpu)
-			create_object((unsigned long)per_cpu_ptr(ptr, cpu),
-				      size, 0, gfp);
+		create_object_percpu((unsigned long)ptr, size, 0, gfp);
 }
 EXPORT_SYMBOL_GPL(kmemleak_alloc_percpu);
 
@@ -1098,7 +1110,7 @@ void __ref kmemleak_free(const void *ptr)
 	pr_debug("%s(0x%px)\n", __func__, ptr);
 
 	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
-		delete_object_full((unsigned long)ptr);
+		delete_object_full((unsigned long)ptr, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free);
 
@@ -1116,7 +1128,7 @@ void __ref kmemleak_free_part(const void *ptr, size_t size)
 	pr_debug("%s(0x%px)\n", __func__, ptr);
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
-		delete_object_part((unsigned long)ptr, size, false);
+		delete_object_part((unsigned long)ptr, size, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_part);
 
@@ -1129,14 +1141,10 @@ EXPORT_SYMBOL_GPL(kmemleak_free_part);
  */
 void __ref kmemleak_free_percpu(const void __percpu *ptr)
 {
-	unsigned int cpu;
-
 	pr_debug("%s(0x%px)\n", __func__, ptr);
 
 	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
-		for_each_possible_cpu(cpu)
-			delete_object_full((unsigned long)per_cpu_ptr(ptr,
-								      cpu));
+		delete_object_full((unsigned long)ptr, OBJECT_PERCPU);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_percpu);
 
@@ -1204,7 +1212,7 @@ void __ref kmemleak_ignore(const void *ptr)
 	pr_debug("%s(0x%px)\n", __func__, ptr);
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
-		make_black_object((unsigned long)ptr, false);
+		make_black_object((unsigned long)ptr, 0);
 }
 EXPORT_SYMBOL(kmemleak_ignore);
 
@@ -1278,7 +1286,7 @@ void __ref kmemleak_free_part_phys(phys_addr_t phys, size_t size)
 	pr_debug("%s(0x%px)\n", __func__, &phys);
 
 	if (kmemleak_enabled)
-		delete_object_part((unsigned long)phys, size, true);
+		delete_object_part((unsigned long)phys, size, OBJECT_PHYS);
 }
 EXPORT_SYMBOL(kmemleak_free_part_phys);
 
@@ -1292,7 +1300,7 @@ void __ref kmemleak_ignore_phys(phys_addr_t phys)
 	pr_debug("%s(0x%px)\n", __func__, &phys);
 
 	if (kmemleak_enabled)
-		make_black_object((unsigned long)phys, true);
+		make_black_object((unsigned long)phys, OBJECT_PHYS);
 }
 EXPORT_SYMBOL(kmemleak_ignore_phys);
 
@@ -1303,7 +1311,7 @@ static bool update_checksum(struct kmemleak_object *object)
 {
 	u32 old_csum = object->checksum;
 
-	if (WARN_ON_ONCE(object->flags & OBJECT_PHYS))
+	if (WARN_ON_ONCE(object->flags & (OBJECT_PHYS | OBJECT_PERCPU)))
 		return false;
 
 	kasan_disable_current();
@@ -1459,7 +1467,6 @@ static void scan_object(struct kmemleak_object *object)
 {
 	struct kmemleak_scan_area *area;
 	unsigned long flags;
-	void *obj_ptr;
 
 	/*
 	 * Once the object->lock is acquired, the corresponding memory block
@@ -1472,14 +1479,27 @@ static void scan_object(struct kmemleak_object *object)
 		/* already freed object */
 		goto out;
 
-	obj_ptr = object->flags & OBJECT_PHYS ?
-		  __va((phys_addr_t)object->pointer) :
-		  (void *)object->pointer;
+	if (object->flags & OBJECT_PERCPU) {
+		unsigned int cpu;
 
-	if (hlist_empty(&object->area_list) ||
+		for_each_possible_cpu(cpu) {
+			void *start = per_cpu_ptr((void __percpu *)object->pointer, cpu);
+			void *end = start + object->size;
+
+			scan_block(start, end, object);
+
+			raw_spin_unlock_irqrestore(&object->lock, flags);
+			cond_resched();
+			raw_spin_lock_irqsave(&object->lock, flags);
+			if (!(object->flags & OBJECT_ALLOCATED))
+				break;
+		}
+	} else if (hlist_empty(&object->area_list) ||
 	    object->flags & OBJECT_FULL_SCAN) {
-		void *start = obj_ptr;
-		void *end = obj_ptr + object->size;
+		void *start = object->flags & OBJECT_PHYS ?
+				__va((phys_addr_t)object->pointer) :
+				(void *)object->pointer;
+		void *end = start + object->size;
 		void *next;
 
 		do {
@@ -1494,11 +1514,12 @@ static void scan_object(struct kmemleak_object *object)
 			cond_resched();
 			raw_spin_lock_irqsave(&object->lock, flags);
 		} while (object->flags & OBJECT_ALLOCATED);
-	} else
+	} else {
 		hlist_for_each_entry(area, &object->area_list, node)
 			scan_block((void *)area->start,
 				   (void *)(area->start + area->size),
 				   object);
+	}
 out:
 	raw_spin_unlock_irqrestore(&object->lock, flags);
 }

-- 
Catalin

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
  2023-11-28 16:04 ` Catalin Marinas
@ 2023-11-29  2:35   ` Waiman Long
  2023-11-29 22:57   ` Waiman Long
  1 sibling, 0 replies; 5+ messages in thread
From: Waiman Long @ 2023-11-29  2:35 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, linux-kernel


On 11/28/23 11:04, Catalin Marinas wrote:
> On Mon, Nov 27, 2023 at 02:41:53PM -0500, Waiman Long wrote:
>>   /**
>>    * kmemleak_free_percpu - unregister a previously registered __percpu object
>>    * @ptr:	__percpu pointer to beginning of the object
>>    *
>>    * This function is called from the kernel percpu allocator when an object
>> - * (memory block) is freed (free_percpu).
>> + * (memory block) is freed (free_percpu). Since this function is inherently
>> + * slow especially on systems with a large number of CPUs, defer the actual
>> + * removal of kmemleak objects associated with the percpu pointer to a
>> + * workqueue if it is not in a task context.
>>    */
>>   void __ref kmemleak_free_percpu(const void __percpu *ptr)
>>   {
>> -	unsigned int cpu;
>> -
>>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>>   
>> -	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
>> -		for_each_possible_cpu(cpu)
>> -			delete_object_full((unsigned long)per_cpu_ptr(ptr,
>> -								      cpu));
>> +	if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr))
>> +		return;
>> +
>> +	if (!in_task()) {
>> +		struct kmemleak_percpu_addr *addr;
>> +
>> +		addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
>> +		if (addr) {
>> +			INIT_WORK(&addr->work, kmemleak_free_percpu_workfn);
>> +			addr->ptr = ptr;
>> +			queue_work(system_long_wq, &addr->work);
>> +			return;
>> +		}
> We can't defer this freeing. It can mess up the kmemleak metadata if the
> per-cpu pointer is re-allocated before kmemleak removed it from its
> object tree.
You are right. In fact, it is possible for kmemleak_free_percpu() be 
called from softIRQ context. And if the system has hundreds of CPUs, it 
will take a long time to process all the free request.
>
> The problem is looking up the object tree for each per-cpu offset. We
> can make the percpu pointer handling O(1) since freeing is only done by
> the main __percpu pointer, so that's the only one needing a look-up. So
> far the per-cpu pointers are not tracked for leaking, only scanned.
>
> We could just add the per_cpu_ptr(ptr, 0) to the kmemleak
> object_tree_root but when scanning we don't have an inverse function to
> get the __percpu pointer back and calculate the pointers for the other
> CPUs (well, we could with some hacks but they are probably fragile).
We could keep a separate tree to track the percpu area. We will know the 
max percpu offset in each percpu area. The base of the percpu area is 
just per_cpu_ptr(0, cpu).
>
> What I came up with is a separate object_percpu_tree_root similar to the
> object_phys_tree_root. The only reason for these additional trees is to
> look up the kmemleak metadata when needed (usually freeing). They don't
> contain objects that are tracked for actual leaking, only scanned. A
> briefly tested patch below. I need to go through it again, update some
> comments and write a commit log:

That sounds like a good idea like what I have said above. I will do a 
more careful review of the change tomorrow as it is getting late for me 
today.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
  2023-11-28 16:04 ` Catalin Marinas
  2023-11-29  2:35   ` Waiman Long
@ 2023-11-29 22:57   ` Waiman Long
  2023-11-30 12:21     ` Catalin Marinas
  1 sibling, 1 reply; 5+ messages in thread
From: Waiman Long @ 2023-11-29 22:57 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Andrew Morton, linux-mm, linux-kernel

On 11/28/23 11:04, Catalin Marinas wrote:
> On Mon, Nov 27, 2023 at 02:41:53PM -0500, Waiman Long wrote:
>>   /**
>>    * kmemleak_free_percpu - unregister a previously registered __percpu object
>>    * @ptr:	__percpu pointer to beginning of the object
>>    *
>>    * This function is called from the kernel percpu allocator when an object
>> - * (memory block) is freed (free_percpu).
>> + * (memory block) is freed (free_percpu). Since this function is inherently
>> + * slow especially on systems with a large number of CPUs, defer the actual
>> + * removal of kmemleak objects associated with the percpu pointer to a
>> + * workqueue if it is not in a task context.
>>    */
>>   void __ref kmemleak_free_percpu(const void __percpu *ptr)
>>   {
>> -	unsigned int cpu;
>> -
>>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>>   
>> -	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
>> -		for_each_possible_cpu(cpu)
>> -			delete_object_full((unsigned long)per_cpu_ptr(ptr,
>> -								      cpu));
>> +	if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr))
>> +		return;
>> +
>> +	if (!in_task()) {
>> +		struct kmemleak_percpu_addr *addr;
>> +
>> +		addr = kzalloc(sizeof(*addr), GFP_ATOMIC);
>> +		if (addr) {
>> +			INIT_WORK(&addr->work, kmemleak_free_percpu_workfn);
>> +			addr->ptr = ptr;
>> +			queue_work(system_long_wq, &addr->work);
>> +			return;
>> +		}
> We can't defer this freeing. It can mess up the kmemleak metadata if the
> per-cpu pointer is re-allocated before kmemleak removed it from its
> object tree.
>
> The problem is looking up the object tree for each per-cpu offset. We
> can make the percpu pointer handling O(1) since freeing is only done by
> the main __percpu pointer, so that's the only one needing a look-up. So
> far the per-cpu pointers are not tracked for leaking, only scanned.
>
> We could just add the per_cpu_ptr(ptr, 0) to the kmemleak
> object_tree_root but when scanning we don't have an inverse function to
> get the __percpu pointer back and calculate the pointers for the other
> CPUs (well, we could with some hacks but they are probably fragile).
>
> What I came up with is a separate object_percpu_tree_root similar to the
> object_phys_tree_root. The only reason for these additional trees is to
> look up the kmemleak metadata when needed (usually freeing). They don't
> contain objects that are tracked for actual leaking, only scanned. A
> briefly tested patch below. I need to go through it again, update some
> comments and write a commit log:
>
> ---------------------8<---------------------------------
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 1eacca03bedd..7446c9e0b8c8 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -178,6 +178,8 @@ struct kmemleak_object {
>   #define OBJECT_FULL_SCAN	(1 << 3)
>   /* flag set for object allocated with physical address */
>   #define OBJECT_PHYS		(1 << 4)
> +/* flag set for per-CPU pointers */
> +#define OBJECT_PERCPU		(1 << 5)
>   
>   /* set when __remove_object() called */
>   #define DELSTATE_REMOVED	(1 << 0)
> @@ -206,6 +208,8 @@ static LIST_HEAD(mem_pool_free_list);
>   static struct rb_root object_tree_root = RB_ROOT;
>   /* search tree for object (with OBJECT_PHYS flag) boundaries */
>   static struct rb_root object_phys_tree_root = RB_ROOT;
> +/* search tree for object (with OBJECT_PERCPU flag) boundaries */
> +static struct rb_root object_percpu_tree_root = RB_ROOT;
>   /* protecting the access to object_list, object_tree_root (or object_phys_tree_root) */
>   static DEFINE_RAW_SPINLOCK(kmemleak_lock);
>   
> @@ -298,7 +302,7 @@ static void hex_dump_object(struct seq_file *seq,
>   	const u8 *ptr = (const u8 *)object->pointer;
>   	size_t len;
>   
> -	if (WARN_ON_ONCE(object->flags & OBJECT_PHYS))
> +	if (WARN_ON_ONCE(object->flags & (OBJECT_PHYS | OBJECT_PERCPU)))
>   		return;
>   
>   	/* limit the number of lines to HEX_MAX_LINES */
> @@ -392,6 +396,15 @@ static void dump_object_info(struct kmemleak_object *object)
>   		stack_depot_print(object->trace_handle);
>   }
>   
> +static struct rb_root *object_tree(unsigned long objflags)
> +{
> +	if (objflags & OBJECT_PHYS)
> +		return &object_phys_tree_root;
> +	if (objflags & OBJECT_PERCPU)
> +		return &object_percpu_tree_root;
> +	return &object_tree_root;
> +}
> +
>   /*
>    * Look-up a memory block metadata (kmemleak_object) in the object search
>    * tree based on a pointer value. If alias is 0, only values pointing to the
> @@ -399,10 +412,9 @@ static void dump_object_info(struct kmemleak_object *object)
>    * when calling this function.
>    */
>   static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
> -					       bool is_phys)
> +					       unsigned int objflags)
>   {
> -	struct rb_node *rb = is_phys ? object_phys_tree_root.rb_node :
> -			     object_tree_root.rb_node;
> +	struct rb_node *rb = object_tree(objflags)->rb_node;
>   	unsigned long untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
>   
>   	while (rb) {
> @@ -431,7 +443,7 @@ static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
>   /* Look-up a kmemleak object which allocated with virtual address. */
>   static struct kmemleak_object *lookup_object(unsigned long ptr, int alias)
>   {
> -	return __lookup_object(ptr, alias, false);
> +	return __lookup_object(ptr, alias, 0);
>   }
>   
>   /*
> @@ -544,14 +556,14 @@ static void put_object(struct kmemleak_object *object)
>    * Look up an object in the object search tree and increase its use_count.
>    */
>   static struct kmemleak_object *__find_and_get_object(unsigned long ptr, int alias,
> -						     bool is_phys)
> +						     unsigned int objflags)
>   {
>   	unsigned long flags;
>   	struct kmemleak_object *object;
>   
>   	rcu_read_lock();
>   	raw_spin_lock_irqsave(&kmemleak_lock, flags);
> -	object = __lookup_object(ptr, alias, is_phys);
> +	object = __lookup_object(ptr, alias, objflags);
>   	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
>   
>   	/* check whether the object is still available */
> @@ -565,7 +577,7 @@ static struct kmemleak_object *__find_and_get_object(unsigned long ptr, int alia
>   /* Look up and get an object which allocated with virtual address. */
>   static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
>   {
> -	return __find_and_get_object(ptr, alias, false);
> +	return __find_and_get_object(ptr, alias, 0);
>   }
>   
>   /*
> @@ -575,9 +587,7 @@ static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
>    */
>   static void __remove_object(struct kmemleak_object *object)
>   {
> -	rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
> -				   &object_phys_tree_root :
> -				   &object_tree_root);
> +	rb_erase(&object->rb_node, object_tree(object->flags));
>   	if (!(object->del_state & DELSTATE_NO_DELETE))
>   		list_del_rcu(&object->object_list);
>   	object->del_state |= DELSTATE_REMOVED;
> @@ -585,11 +595,11 @@ static void __remove_object(struct kmemleak_object *object)
>   
>   static struct kmemleak_object *__find_and_remove_object(unsigned long ptr,
>   							int alias,
> -							bool is_phys)
> +							unsigned int objflags)
>   {
>   	struct kmemleak_object *object;
>   
> -	object = __lookup_object(ptr, alias, is_phys);
> +	object = __lookup_object(ptr, alias, objflags);
>   	if (object)
>   		__remove_object(object);
>   
> @@ -603,13 +613,13 @@ static struct kmemleak_object *__find_and_remove_object(unsigned long ptr,
>    * by create_object().
>    */
>   static struct kmemleak_object *find_and_remove_object(unsigned long ptr, int alias,
> -						      bool is_phys)
> +						      unsigned int objflags)
>   {
>   	unsigned long flags;
>   	struct kmemleak_object *object;
>   
>   	raw_spin_lock_irqsave(&kmemleak_lock, flags);
> -	object = __find_and_remove_object(ptr, alias, is_phys);
> +	object = __find_and_remove_object(ptr, alias, objflags);
>   	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
>   
>   	return object;
> @@ -648,7 +658,7 @@ static struct kmemleak_object *__alloc_object(gfp_t gfp)
>   }
>   
>   static int __link_object(struct kmemleak_object *object, unsigned long ptr,
> -			 size_t size, int min_count, bool is_phys)
> +			 size_t size, int min_count, unsigned int objflags)
>   {
>   
>   	struct kmemleak_object *parent;
> @@ -661,7 +671,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
>   	INIT_HLIST_HEAD(&object->area_list);
>   	raw_spin_lock_init(&object->lock);
>   	atomic_set(&object->use_count, 1);
> -	object->flags = OBJECT_ALLOCATED | (is_phys ? OBJECT_PHYS : 0);
> +	object->flags = OBJECT_ALLOCATED | objflags;
>   	object->pointer = ptr;
>   	object->size = kfence_ksize((void *)ptr) ?: size;
>   	object->excess_ref = 0;
> @@ -697,12 +707,11 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
>   	 * Only update min_addr and max_addr with object
>   	 * storing virtual address.
>   	 */
> -	if (!is_phys) {
> +	if (!(objflags & (OBJECT_PHYS | OBJECT_PERCPU))) {
>   		min_addr = min(min_addr, untagged_ptr);
>   		max_addr = max(max_addr, untagged_ptr + size);
>   	}
> -	link = is_phys ? &object_phys_tree_root.rb_node :
> -		&object_tree_root.rb_node;
> +	link = &object_tree(objflags)->rb_node;
>   	rb_parent = NULL;
>   	while (*link) {
>   		rb_parent = *link;
> @@ -724,8 +733,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
>   		}
>   	}
>   	rb_link_node(&object->rb_node, rb_parent, link);
> -	rb_insert_color(&object->rb_node, is_phys ? &object_phys_tree_root :
> -					  &object_tree_root);
> +	rb_insert_color(&object->rb_node, object_tree(objflags));
>   	list_add_tail_rcu(&object->object_list, &object_list);
>   
>   	return 0;
> @@ -737,7 +745,7 @@ static int __link_object(struct kmemleak_object *object, unsigned long ptr,
>    * object_phys_tree_root).
>    */
>   static void __create_object(unsigned long ptr, size_t size,
> -				int min_count, gfp_t gfp, bool is_phys)
> +				int min_count, gfp_t gfp, unsigned int objflags)
>   {
>   	struct kmemleak_object *object;
>   	unsigned long flags;
> @@ -748,7 +756,7 @@ static void __create_object(unsigned long ptr, size_t size,
>   		return;
>   
>   	raw_spin_lock_irqsave(&kmemleak_lock, flags);
> -	ret = __link_object(object, ptr, size, min_count, is_phys);
> +	ret = __link_object(object, ptr, size, min_count, objflags);
>   	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
>   	if (ret)
>   		mem_pool_free(object);
> @@ -758,14 +766,21 @@ static void __create_object(unsigned long ptr, size_t size,
>   static void create_object(unsigned long ptr, size_t size,
>   			  int min_count, gfp_t gfp)
>   {
> -	__create_object(ptr, size, min_count, gfp, false);
> +	__create_object(ptr, size, min_count, gfp, 0);
>   }
>   
>   /* Create kmemleak object which allocated with physical address. */
>   static void create_object_phys(unsigned long ptr, size_t size,
>   			       int min_count, gfp_t gfp)
>   {
> -	__create_object(ptr, size, min_count, gfp, true);
> +	__create_object(ptr, size, min_count, gfp, OBJECT_PHYS);
> +}
> +
> +/* Create kmemleak object corresponding to a per-CPU allocation. */
> +static void create_object_percpu(unsigned long ptr, size_t size,
> +				 int min_count, gfp_t gfp)
> +{
> +	__create_object(ptr, size, min_count, gfp, OBJECT_PERCPU);
>   }
>   
>   /*
> @@ -792,11 +807,11 @@ static void __delete_object(struct kmemleak_object *object)
>    * Look up the metadata (struct kmemleak_object) corresponding to ptr and
>    * delete it.
>    */
> -static void delete_object_full(unsigned long ptr)
> +static void delete_object_full(unsigned long ptr, unsigned int objflags)
>   {
>   	struct kmemleak_object *object;
>   
> -	object = find_and_remove_object(ptr, 0, false);
> +	object = find_and_remove_object(ptr, 0, objflags);
>   	if (!object) {
>   #ifdef DEBUG
>   		kmemleak_warn("Freeing unknown object at 0x%08lx\n",
> @@ -812,7 +827,8 @@ static void delete_object_full(unsigned long ptr)
>    * delete it. If the memory block is partially freed, the function may create
>    * additional metadata for the remaining parts of the block.
>    */
> -static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
> +static void delete_object_part(unsigned long ptr, size_t size,
> +			       unsigned int objflags)
>   {
>   	struct kmemleak_object *object, *object_l, *object_r;
>   	unsigned long start, end, flags;
> @@ -826,7 +842,7 @@ static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
>   		goto out;
>   
>   	raw_spin_lock_irqsave(&kmemleak_lock, flags);
> -	object = __find_and_remove_object(ptr, 1, is_phys);
> +	object = __find_and_remove_object(ptr, 1, objflags);
>   	if (!object) {
>   #ifdef DEBUG
>   		kmemleak_warn("Partially freeing unknown object at 0x%08lx (size %zu)\n",
> @@ -844,11 +860,11 @@ static void delete_object_part(unsigned long ptr, size_t size, bool is_phys)
>   	end = object->pointer + object->size;
>   	if ((ptr > start) &&
>   	    !__link_object(object_l, start, ptr - start,
> -			   object->min_count, is_phys))
> +			   object->min_count, objflags))
>   		object_l = NULL;
>   	if ((ptr + size < end) &&
>   	    !__link_object(object_r, ptr + size, end - ptr - size,
> -			   object->min_count, is_phys))
> +			   object->min_count, objflags))
>   		object_r = NULL;
>   
>   unlock:
> @@ -879,11 +895,11 @@ static void paint_it(struct kmemleak_object *object, int color)
>   	raw_spin_unlock_irqrestore(&object->lock, flags);
>   }
>   
> -static void paint_ptr(unsigned long ptr, int color, bool is_phys)
> +static void paint_ptr(unsigned long ptr, int color, unsigned int objflags)
>   {
>   	struct kmemleak_object *object;
>   
> -	object = __find_and_get_object(ptr, 0, is_phys);
> +	object = __find_and_get_object(ptr, 0, objflags);
>   	if (!object) {
>   		kmemleak_warn("Trying to color unknown object at 0x%08lx as %s\n",
>   			      ptr,
> @@ -901,16 +917,16 @@ static void paint_ptr(unsigned long ptr, int color, bool is_phys)
>    */
>   static void make_gray_object(unsigned long ptr)
>   {
> -	paint_ptr(ptr, KMEMLEAK_GREY, false);
> +	paint_ptr(ptr, KMEMLEAK_GREY, 0);
>   }
>   
>   /*
>    * Mark the object as black-colored so that it is ignored from scans and
>    * reporting.
>    */
> -static void make_black_object(unsigned long ptr, bool is_phys)
> +static void make_black_object(unsigned long ptr, unsigned int objflags)
>   {
> -	paint_ptr(ptr, KMEMLEAK_BLACK, is_phys);
> +	paint_ptr(ptr, KMEMLEAK_BLACK, objflags);
>   }
>   
>   /*
> @@ -1046,8 +1062,6 @@ EXPORT_SYMBOL_GPL(kmemleak_alloc);
>   void __ref kmemleak_alloc_percpu(const void __percpu *ptr, size_t size,
>   				 gfp_t gfp)
>   {
> -	unsigned int cpu;
> -
>   	pr_debug("%s(0x%px, %zu)\n", __func__, ptr, size);
>   
>   	/*
> @@ -1055,9 +1069,7 @@ void __ref kmemleak_alloc_percpu(const void __percpu *ptr, size_t size,
>   	 * (min_count is set to 0).
>   	 */
>   	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
> -		for_each_possible_cpu(cpu)
> -			create_object((unsigned long)per_cpu_ptr(ptr, cpu),
> -				      size, 0, gfp);
> +		create_object_percpu((unsigned long)ptr, size, 0, gfp);
>   }
>   EXPORT_SYMBOL_GPL(kmemleak_alloc_percpu);
>   
> @@ -1098,7 +1110,7 @@ void __ref kmemleak_free(const void *ptr)
>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>   
>   	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
> -		delete_object_full((unsigned long)ptr);
> +		delete_object_full((unsigned long)ptr, 0);
>   }
>   EXPORT_SYMBOL_GPL(kmemleak_free);
>   
> @@ -1116,7 +1128,7 @@ void __ref kmemleak_free_part(const void *ptr, size_t size)
>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>   
>   	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
> -		delete_object_part((unsigned long)ptr, size, false);
> +		delete_object_part((unsigned long)ptr, size, 0);
>   }
>   EXPORT_SYMBOL_GPL(kmemleak_free_part);
>   
> @@ -1129,14 +1141,10 @@ EXPORT_SYMBOL_GPL(kmemleak_free_part);
>    */
>   void __ref kmemleak_free_percpu(const void __percpu *ptr)
>   {
> -	unsigned int cpu;
> -
>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>   
>   	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
> -		for_each_possible_cpu(cpu)
> -			delete_object_full((unsigned long)per_cpu_ptr(ptr,
> -								      cpu));
> +		delete_object_full((unsigned long)ptr, OBJECT_PERCPU);
>   }
>   EXPORT_SYMBOL_GPL(kmemleak_free_percpu);
>   
> @@ -1204,7 +1212,7 @@ void __ref kmemleak_ignore(const void *ptr)
>   	pr_debug("%s(0x%px)\n", __func__, ptr);
>   
>   	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
> -		make_black_object((unsigned long)ptr, false);
> +		make_black_object((unsigned long)ptr, 0);
>   }
>   EXPORT_SYMBOL(kmemleak_ignore);
>   
> @@ -1278,7 +1286,7 @@ void __ref kmemleak_free_part_phys(phys_addr_t phys, size_t size)
>   	pr_debug("%s(0x%px)\n", __func__, &phys);
>   
>   	if (kmemleak_enabled)
> -		delete_object_part((unsigned long)phys, size, true);
> +		delete_object_part((unsigned long)phys, size, OBJECT_PHYS);
>   }
>   EXPORT_SYMBOL(kmemleak_free_part_phys);
>   
> @@ -1292,7 +1300,7 @@ void __ref kmemleak_ignore_phys(phys_addr_t phys)
>   	pr_debug("%s(0x%px)\n", __func__, &phys);
>   
>   	if (kmemleak_enabled)
> -		make_black_object((unsigned long)phys, true);
> +		make_black_object((unsigned long)phys, OBJECT_PHYS);
>   }
>   EXPORT_SYMBOL(kmemleak_ignore_phys);
>   
> @@ -1303,7 +1311,7 @@ static bool update_checksum(struct kmemleak_object *object)
>   {
>   	u32 old_csum = object->checksum;
>   
> -	if (WARN_ON_ONCE(object->flags & OBJECT_PHYS))
> +	if (WARN_ON_ONCE(object->flags & (OBJECT_PHYS | OBJECT_PERCPU)))
>   		return false;
>   
>   	kasan_disable_current();
> @@ -1459,7 +1467,6 @@ static void scan_object(struct kmemleak_object *object)
>   {
>   	struct kmemleak_scan_area *area;
>   	unsigned long flags;
> -	void *obj_ptr;
>   
>   	/*
>   	 * Once the object->lock is acquired, the corresponding memory block
> @@ -1472,14 +1479,27 @@ static void scan_object(struct kmemleak_object *object)
>   		/* already freed object */
>   		goto out;
>   
> -	obj_ptr = object->flags & OBJECT_PHYS ?
> -		  __va((phys_addr_t)object->pointer) :
> -		  (void *)object->pointer;
> +	if (object->flags & OBJECT_PERCPU) {
> +		unsigned int cpu;
>   
> -	if (hlist_empty(&object->area_list) ||
> +		for_each_possible_cpu(cpu) {
> +			void *start = per_cpu_ptr((void __percpu *)object->pointer, cpu);
> +			void *end = start + object->size;
> +
> +			scan_block(start, end, object);
> +
> +			raw_spin_unlock_irqrestore(&object->lock, flags);
> +			cond_resched();
> +			raw_spin_lock_irqsave(&object->lock, flags);
> +			if (!(object->flags & OBJECT_ALLOCATED))
> +				break;
> +		}
> +	} else if (hlist_empty(&object->area_list) ||
>   	    object->flags & OBJECT_FULL_SCAN) {
> -		void *start = obj_ptr;
> -		void *end = obj_ptr + object->size;
> +		void *start = object->flags & OBJECT_PHYS ?
> +				__va((phys_addr_t)object->pointer) :
> +				(void *)object->pointer;
> +		void *end = start + object->size;
>   		void *next;
>   
>   		do {
> @@ -1494,11 +1514,12 @@ static void scan_object(struct kmemleak_object *object)
>   			cond_resched();
>   			raw_spin_lock_irqsave(&object->lock, flags);
>   		} while (object->flags & OBJECT_ALLOCATED);
> -	} else
> +	} else {
>   		hlist_for_each_entry(area, &object->area_list, node)
>   			scan_block((void *)area->start,
>   				   (void *)(area->start + area->size),
>   				   object);
> +	}
>   out:
>   	raw_spin_unlock_irqrestore(&object->lock, flags);
>   }

The patch looks reasonable to me. It also has a side effect of reducing 
the # of kmemleak objects to track especially for system with large 
number of CPUs. Of course, we still need more testing to make sure that 
it won't break anything.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu()
  2023-11-29 22:57   ` Waiman Long
@ 2023-11-30 12:21     ` Catalin Marinas
  0 siblings, 0 replies; 5+ messages in thread
From: Catalin Marinas @ 2023-11-30 12:21 UTC (permalink / raw)
  To: Waiman Long; +Cc: Andrew Morton, linux-mm, linux-kernel

On Wed, Nov 29, 2023 at 05:57:11PM -0500, Waiman Long wrote:
> On 11/28/23 11:04, Catalin Marinas wrote:
> > The problem is looking up the object tree for each per-cpu offset. We
> > can make the percpu pointer handling O(1) since freeing is only done by
> > the main __percpu pointer, so that's the only one needing a look-up. So
> > far the per-cpu pointers are not tracked for leaking, only scanned.
> > 
> > We could just add the per_cpu_ptr(ptr, 0) to the kmemleak
> > object_tree_root but when scanning we don't have an inverse function to
> > get the __percpu pointer back and calculate the pointers for the other
> > CPUs (well, we could with some hacks but they are probably fragile).
> > 
> > What I came up with is a separate object_percpu_tree_root similar to the
> > object_phys_tree_root. The only reason for these additional trees is to
> > look up the kmemleak metadata when needed (usually freeing). They don't
> > contain objects that are tracked for actual leaking, only scanned. A
> > briefly tested patch below. I need to go through it again, update some
> > comments and write a commit log:
[...]
> The patch looks reasonable to me. It also has a side effect of reducing the
> # of kmemleak objects to track especially for system with large number of
> CPUs. Of course, we still need more testing to make sure that it won't break
> anything.

Thanks for having a look. I'll tidy it up and post today or tomorrow. It
can stay in next for a bit before upstreaming to get some exposure
(though not sure many test -next with kmemleak enabled).

-- 
Catalin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-30 12:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-27 19:41 [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu() Waiman Long
2023-11-28 16:04 ` Catalin Marinas
2023-11-29  2:35   ` Waiman Long
2023-11-29 22:57   ` Waiman Long
2023-11-30 12:21     ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.