linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations
@ 2019-08-12 16:06 Catalin Marinas
  2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-12 16:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

Following the discussions on v2 of this patch(set) [1], this series
takes slightly different approach:

- it implements its own simple memory pool that does not rely on the
  slab allocator

- drops the early log buffer logic entirely since it can now allocate
  metadata from the memory pool directly before kmemleak is fully
  initialised

- CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
  CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE

- moves the kmemleak_init() call earlier (mm_init())

- to avoid a separate memory pool for struct scan_area, it makes the
  tool robust when such allocations fail as scan areas are rather an
  optimisation

[1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com

Catalin Marinas (3):
  mm: kmemleak: Make the tool tolerant to struct scan_area allocation
    failures
  mm: kmemleak: Simple memory allocation pool for kmemleak objects
  mm: kmemleak: Use the memory pool for early allocations

 init/main.c       |   2 +-
 lib/Kconfig.debug |  11 +-
 mm/kmemleak.c     | 325 ++++++++++++----------------------------------
 3 files changed, 91 insertions(+), 247 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
@ 2019-08-12 16:06 ` Catalin Marinas
  2019-10-03  6:13   ` Alexey Kardashevskiy
  2019-08-12 16:06 ` [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects Catalin Marinas
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2019-08-12 16:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

Object scan areas are an optimisation aimed to decrease the false
positives and slightly improve the scanning time of large objects known
to only have a few specific pointers. If a struct scan_area fails to
allocate, kmemleak can still function normally by scanning the full
object.

Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
scan_area allocation fails.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/kmemleak.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index f6e602918dac..5ba7fad00fda 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -168,6 +168,8 @@ struct kmemleak_object {
 #define OBJECT_REPORTED		(1 << 1)
 /* flag set to not scan the object */
 #define OBJECT_NO_SCAN		(1 << 2)
+/* flag set to fully scan the object when scan_area allocation failed */
+#define OBJECT_FULL_SCAN	(1 << 3)
 
 #define HEX_PREFIX		"    "
 /* number of bytes to print per line; must be 16 or 32 */
@@ -773,12 +775,14 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 	}
 
 	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
-	if (!area) {
-		pr_warn("Cannot allocate a scan area\n");
-		goto out;
-	}
 
 	spin_lock_irqsave(&object->lock, flags);
+	if (!area) {
+		pr_warn_once("Cannot allocate a scan area, scanning the full object\n");
+		/* mark the object for full scan to avoid false positives */
+		object->flags |= OBJECT_FULL_SCAN;
+		goto out_unlock;
+	}
 	if (size == SIZE_MAX) {
 		size = object->pointer + object->size - ptr;
 	} else if (ptr + size > object->pointer + object->size) {
@@ -795,7 +799,6 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 	hlist_add_head(&area->node, &object->area_list);
 out_unlock:
 	spin_unlock_irqrestore(&object->lock, flags);
-out:
 	put_object(object);
 }
 
@@ -1408,7 +1411,8 @@ static void scan_object(struct kmemleak_object *object)
 	if (!(object->flags & OBJECT_ALLOCATED))
 		/* already freed object */
 		goto out;
-	if (hlist_empty(&object->area_list)) {
+	if (hlist_empty(&object->area_list) ||
+	    object->flags & OBJECT_FULL_SCAN) {
 		void *start = (void *)object->pointer;
 		void *end = (void *)(object->pointer + object->size);
 		void *next;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects
  2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
  2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
@ 2019-08-12 16:06 ` Catalin Marinas
  2019-08-13  9:37   ` Catalin Marinas
  2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2019-08-12 16:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

Add a memory pool for struct kmemleak_object in case the normal
kmem_cache_alloc() fails under the gfp constraints passed by the caller.
The mem_pool[] array size is currently fixed at 16000.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 mm/kmemleak.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 5ba7fad00fda..2fb86524d70b 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -180,11 +180,17 @@ struct kmemleak_object {
 #define HEX_ASCII		1
 /* max number of lines to be printed */
 #define HEX_MAX_LINES		2
+/* memory pool size */
+#define MEM_POOL_SIZE		16000
 
 /* the list of all allocated objects */
 static LIST_HEAD(object_list);
 /* the list of gray-colored objects (see color_gray comment below) */
 static LIST_HEAD(gray_list);
+/* memory pool allocation */
+static struct kmemleak_object mem_pool[MEM_POOL_SIZE];
+static int mem_pool_free_count = ARRAY_SIZE(mem_pool);
+static LIST_HEAD(mem_pool_free_list);
 /* search tree for object boundaries */
 static struct rb_root object_tree_root = RB_ROOT;
 /* rw_lock protecting the access to object_list and object_tree_root */
@@ -451,6 +457,50 @@ static int get_object(struct kmemleak_object *object)
 	return atomic_inc_not_zero(&object->use_count);
 }
 
+/*
+ * Memory pool allocation and freeing. kmemleak_lock must not be held.
+ */
+static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
+{
+	unsigned long flags;
+	struct kmemleak_object *object;
+
+	/* try the slab allocator first */
+	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+	if (object)
+		return object;
+
+	/* slab allocation failed, try the memory pool */
+	write_lock_irqsave(&kmemleak_lock, flags);
+	object = list_first_entry_or_null(&mem_pool_free_list,
+					  typeof(*object), object_list);
+	if (object)
+		list_del(&object->object_list);
+	else if (mem_pool_free_count)
+		object = &mem_pool[--mem_pool_free_count];
+	write_unlock_irqrestore(&kmemleak_lock, flags);
+
+	return object;
+}
+
+/*
+ * Return the object to either the slab allocator or the memory pool.
+ */
+static void mem_pool_free(struct kmemleak_object *object)
+{
+	unsigned long flags;
+
+	if (object < mem_pool || object >= mem_pool + ARRAY_SIZE(mem_pool)) {
+		kmem_cache_free(object_cache, object);
+		return;
+	}
+
+	/* add the object to the memory pool free list */
+	write_lock_irqsave(&kmemleak_lock, flags);
+	list_add(&object->object_list, &mem_pool_free_list);
+	write_unlock_irqrestore(&kmemleak_lock, flags);
+}
+
 /*
  * RCU callback to free a kmemleak_object.
  */
@@ -469,7 +519,7 @@ static void free_object_rcu(struct rcu_head *rcu)
 		hlist_del(&area->node);
 		kmem_cache_free(scan_area_cache, area);
 	}
-	kmem_cache_free(object_cache, object);
+	mem_pool_free(object);
 }
 
 /*
@@ -552,7 +602,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
 	struct rb_node **link, *rb_parent;
 	unsigned long untagged_ptr;
 
-	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+	object = mem_pool_alloc(gfp);
 	if (!object) {
 		pr_warn("Cannot allocate a kmemleak_object structure\n");
 		kmemleak_disable();

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations
  2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
  2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
  2019-08-12 16:06 ` [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects Catalin Marinas
@ 2019-08-12 16:06 ` Catalin Marinas
  2019-08-13  9:35   ` Catalin Marinas
  2019-08-13 12:35   ` Qian Cai
  2019-08-12 21:07 ` [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Andrew Morton
       [not found] ` <08847a90-c37b-890f-8d0e-3ae1c3a1dd71@mellanox.com>
  4 siblings, 2 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-12 16:06 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

Currently kmemleak uses a static early_log buffer to trace all memory
allocation/freeing before the slab allocator is initialised. Such early
log is replayed during kmemleak_init() to properly initialise the
kmemleak metadata for objects allocated up that point. With a memory
pool that does not rely on the slab allocator, it is possible to skip
this early log entirely.

In order to remove the early logging, consider kmemleak_enabled == 1 by
default while the kmem_cache availability is checked directly on the
object_cache and scan_area_cache variables. The RCU callback is only
invoked after object_cache has been initialised as we wouldn't have any
concurrent list traversal before this.

In order to reduce the number of callbacks before kmemleak is fully
initialised, move the kmemleak_init() call to mm_init().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 init/main.c       |   2 +-
 lib/Kconfig.debug |  11 +-
 mm/kmemleak.c     | 267 +++++-----------------------------------------
 3 files changed, 35 insertions(+), 245 deletions(-)

diff --git a/init/main.c b/init/main.c
index 96f8d5af52d6..ca05e3cd7ef7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -556,6 +556,7 @@ static void __init mm_init(void)
 	report_meminit();
 	mem_init();
 	kmem_cache_init();
+	kmemleak_init();
 	pgtable_init();
 	debug_objects_mem_init();
 	vmalloc_init();
@@ -740,7 +741,6 @@ asmlinkage __visible void __init start_kernel(void)
 		initrd_start = 0;
 	}
 #endif
-	kmemleak_init();
 	setup_per_cpu_pageset();
 	numa_policy_init();
 	acpi_early_init();
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4d39540011e2..39df06ffd9f4 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -592,17 +592,18 @@ config DEBUG_KMEMLEAK
 	  In order to access the kmemleak file, debugfs needs to be
 	  mounted (usually at /sys/kernel/debug).
 
-config DEBUG_KMEMLEAK_EARLY_LOG_SIZE
-	int "Maximum kmemleak early log entries"
+config DEBUG_KMEMLEAK_MEM_POOL_SIZE
+	int "Kmemleak memory pool size"
 	depends on DEBUG_KMEMLEAK
 	range 200 40000
 	default 16000
 	help
 	  Kmemleak must track all the memory allocations to avoid
 	  reporting false positives. Since memory may be allocated or
-	  freed before kmemleak is initialised, an early log buffer is
-	  used to store these actions. If kmemleak reports "early log
-	  buffer exceeded", please increase this value.
+	  freed before kmemleak is fully initialised, use a static pool
+	  of metadata objects to track such callbacks. After kmemleak is
+	  fully initialised, this memory pool acts as an emergency one
+	  if slab allocations fail.
 
 config DEBUG_KMEMLEAK_TEST
 	tristate "Simple test for the kernel memory leak detector"
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 2fb86524d70b..bcb05b9b4eb4 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -180,15 +180,13 @@ struct kmemleak_object {
 #define HEX_ASCII		1
 /* max number of lines to be printed */
 #define HEX_MAX_LINES		2
-/* memory pool size */
-#define MEM_POOL_SIZE		16000
 
 /* the list of all allocated objects */
 static LIST_HEAD(object_list);
 /* the list of gray-colored objects (see color_gray comment below) */
 static LIST_HEAD(gray_list);
 /* memory pool allocation */
-static struct kmemleak_object mem_pool[MEM_POOL_SIZE];
+static struct kmemleak_object mem_pool[CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE];
 static int mem_pool_free_count = ARRAY_SIZE(mem_pool);
 static LIST_HEAD(mem_pool_free_list);
 /* search tree for object boundaries */
@@ -201,13 +199,11 @@ static struct kmem_cache *object_cache;
 static struct kmem_cache *scan_area_cache;
 
 /* set if tracing memory operations is enabled */
-static int kmemleak_enabled;
+static int kmemleak_enabled = 1;
 /* same as above but only for the kmemleak_free() callback */
-static int kmemleak_free_enabled;
+static int kmemleak_free_enabled = 1;
 /* set in the late_initcall if there were no errors */
 static int kmemleak_initialized;
-/* enables or disables early logging of the memory operations */
-static int kmemleak_early_log = 1;
 /* set if a kmemleak warning was issued */
 static int kmemleak_warning;
 /* set if a fatal kmemleak error has occurred */
@@ -235,49 +231,6 @@ static bool kmemleak_found_leaks;
 static bool kmemleak_verbose;
 module_param_named(verbose, kmemleak_verbose, bool, 0600);
 
-/*
- * Early object allocation/freeing logging. Kmemleak is initialized after the
- * kernel allocator. However, both the kernel allocator and kmemleak may
- * allocate memory blocks which need to be tracked. Kmemleak defines an
- * arbitrary buffer to hold the allocation/freeing information before it is
- * fully initialized.
- */
-
-/* kmemleak operation type for early logging */
-enum {
-	KMEMLEAK_ALLOC,
-	KMEMLEAK_ALLOC_PERCPU,
-	KMEMLEAK_FREE,
-	KMEMLEAK_FREE_PART,
-	KMEMLEAK_FREE_PERCPU,
-	KMEMLEAK_NOT_LEAK,
-	KMEMLEAK_IGNORE,
-	KMEMLEAK_SCAN_AREA,
-	KMEMLEAK_NO_SCAN,
-	KMEMLEAK_SET_EXCESS_REF
-};
-
-/*
- * Structure holding the information passed to kmemleak callbacks during the
- * early logging.
- */
-struct early_log {
-	int op_type;			/* kmemleak operation type */
-	int min_count;			/* minimum reference count */
-	const void *ptr;		/* allocated/freed memory block */
-	union {
-		size_t size;		/* memory block size */
-		unsigned long excess_ref; /* surplus reference passing */
-	};
-	unsigned long trace[MAX_TRACE];	/* stack trace */
-	unsigned int trace_len;		/* stack trace length */
-};
-
-/* early logging buffer and current position */
-static struct early_log
-	early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata;
-static int crt_early_log __initdata;
-
 static void kmemleak_disable(void);
 
 /*
@@ -466,9 +419,13 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 	struct kmemleak_object *object;
 
 	/* try the slab allocator first */
-	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
-	if (object)
-		return object;
+	if (object_cache) {
+		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+		if (object)
+			return object;
+		else
+			WARN_ON_ONCE(1);
+	}
 
 	/* slab allocation failed, try the memory pool */
 	write_lock_irqsave(&kmemleak_lock, flags);
@@ -478,6 +435,8 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 		list_del(&object->object_list);
 	else if (mem_pool_free_count)
 		object = &mem_pool[--mem_pool_free_count];
+	else
+		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
 	write_unlock_irqrestore(&kmemleak_lock, flags);
 
 	return object;
@@ -537,7 +496,15 @@ static void put_object(struct kmemleak_object *object)
 	/* should only get here after delete_object was called */
 	WARN_ON(object->flags & OBJECT_ALLOCATED);
 
-	call_rcu(&object->rcu, free_object_rcu);
+	/*
+	 * It may be too early for the RCU callbacks, however, there is no
+	 * concurrent object_list traversal when !object_cache and all objects
+	 * came from the memory pool. Free the object directly.
+	 */
+	if (object_cache)
+		call_rcu(&object->rcu, free_object_rcu);
+	else
+		free_object_rcu(&object->rcu);
 }
 
 /*
@@ -741,9 +708,7 @@ static void delete_object_part(unsigned long ptr, size_t size)
 	/*
 	 * Create one or two objects that may result from the memory block
 	 * split. Note that partial freeing is only done by free_bootmem() and
-	 * this happens before kmemleak_init() is called. The path below is
-	 * only executed during early log recording in kmemleak_init(), so
-	 * GFP_KERNEL is enough.
+	 * this happens before kmemleak_init() is called.
 	 */
 	start = object->pointer;
 	end = object->pointer + object->size;
@@ -815,7 +780,7 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 {
 	unsigned long flags;
 	struct kmemleak_object *object;
-	struct kmemleak_scan_area *area;
+	struct kmemleak_scan_area *area = NULL;
 
 	object = find_and_get_object(ptr, 1);
 	if (!object) {
@@ -824,7 +789,8 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
 		return;
 	}
 
-	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
+	if (scan_area_cache)
+		area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
 
 	spin_lock_irqsave(&object->lock, flags);
 	if (!area) {
@@ -898,86 +864,6 @@ static void object_no_scan(unsigned long ptr)
 	put_object(object);
 }
 
-/*
- * Log an early kmemleak_* call to the early_log buffer. These calls will be
- * processed later once kmemleak is fully initialized.
- */
-static void __init log_early(int op_type, const void *ptr, size_t size,
-			     int min_count)
-{
-	unsigned long flags;
-	struct early_log *log;
-
-	if (kmemleak_error) {
-		/* kmemleak stopped recording, just count the requests */
-		crt_early_log++;
-		return;
-	}
-
-	if (crt_early_log >= ARRAY_SIZE(early_log)) {
-		crt_early_log++;
-		kmemleak_disable();
-		return;
-	}
-
-	/*
-	 * There is no need for locking since the kernel is still in UP mode
-	 * at this stage. Disabling the IRQs is enough.
-	 */
-	local_irq_save(flags);
-	log = &early_log[crt_early_log];
-	log->op_type = op_type;
-	log->ptr = ptr;
-	log->size = size;
-	log->min_count = min_count;
-	log->trace_len = __save_stack_trace(log->trace);
-	crt_early_log++;
-	local_irq_restore(flags);
-}
-
-/*
- * Log an early allocated block and populate the stack trace.
- */
-static void early_alloc(struct early_log *log)
-{
-	struct kmemleak_object *object;
-	unsigned long flags;
-	int i;
-
-	if (!kmemleak_enabled || !log->ptr || IS_ERR(log->ptr))
-		return;
-
-	/*
-	 * RCU locking needed to ensure object is not freed via put_object().
-	 */
-	rcu_read_lock();
-	object = create_object((unsigned long)log->ptr, log->size,
-			       log->min_count, GFP_ATOMIC);
-	if (!object)
-		goto out;
-	spin_lock_irqsave(&object->lock, flags);
-	for (i = 0; i < log->trace_len; i++)
-		object->trace[i] = log->trace[i];
-	object->trace_len = log->trace_len;
-	spin_unlock_irqrestore(&object->lock, flags);
-out:
-	rcu_read_unlock();
-}
-
-/*
- * Log an early allocated block and populate the stack trace.
- */
-static void early_alloc_percpu(struct early_log *log)
-{
-	unsigned int cpu;
-	const void __percpu *ptr = log->ptr;
-
-	for_each_possible_cpu(cpu) {
-		log->ptr = per_cpu_ptr(ptr, cpu);
-		early_alloc(log);
-	}
-}
-
 /**
  * kmemleak_alloc - register a newly allocated object
  * @ptr:	pointer to beginning of the object
@@ -999,8 +885,6 @@ void __ref kmemleak_alloc(const void *ptr, size_t size, int min_count,
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
 		create_object((unsigned long)ptr, size, min_count, gfp);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_ALLOC, ptr, size, min_count);
 }
 EXPORT_SYMBOL_GPL(kmemleak_alloc);
 
@@ -1028,8 +912,6 @@ void __ref kmemleak_alloc_percpu(const void __percpu *ptr, size_t size,
 		for_each_possible_cpu(cpu)
 			create_object((unsigned long)per_cpu_ptr(ptr, cpu),
 				      size, 0, gfp);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_ALLOC_PERCPU, ptr, size, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_alloc_percpu);
 
@@ -1054,11 +936,6 @@ void __ref kmemleak_vmalloc(const struct vm_struct *area, size_t size, gfp_t gfp
 		create_object((unsigned long)area->addr, size, 2, gfp);
 		object_set_excess_ref((unsigned long)area,
 				      (unsigned long)area->addr);
-	} else if (kmemleak_early_log) {
-		log_early(KMEMLEAK_ALLOC, area->addr, size, 2);
-		/* reusing early_log.size for storing area->addr */
-		log_early(KMEMLEAK_SET_EXCESS_REF,
-			  area, (unsigned long)area->addr, 0);
 	}
 }
 EXPORT_SYMBOL_GPL(kmemleak_vmalloc);
@@ -1076,8 +953,6 @@ void __ref kmemleak_free(const void *ptr)
 
 	if (kmemleak_free_enabled && ptr && !IS_ERR(ptr))
 		delete_object_full((unsigned long)ptr);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_FREE, ptr, 0, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free);
 
@@ -1096,8 +971,6 @@ void __ref kmemleak_free_part(const void *ptr, size_t size)
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
 		delete_object_part((unsigned long)ptr, size);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_FREE_PART, ptr, size, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_part);
 
@@ -1118,8 +991,6 @@ void __ref kmemleak_free_percpu(const void __percpu *ptr)
 		for_each_possible_cpu(cpu)
 			delete_object_full((unsigned long)per_cpu_ptr(ptr,
 								      cpu));
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_FREE_PERCPU, ptr, 0, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_percpu);
 
@@ -1170,8 +1041,6 @@ void __ref kmemleak_not_leak(const void *ptr)
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
 		make_gray_object((unsigned long)ptr);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_NOT_LEAK, ptr, 0, 0);
 }
 EXPORT_SYMBOL(kmemleak_not_leak);
 
@@ -1190,8 +1059,6 @@ void __ref kmemleak_ignore(const void *ptr)
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
 		make_black_object((unsigned long)ptr);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_IGNORE, ptr, 0, 0);
 }
 EXPORT_SYMBOL(kmemleak_ignore);
 
@@ -1212,8 +1079,6 @@ void __ref kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp)
 
 	if (kmemleak_enabled && ptr && size && !IS_ERR(ptr))
 		add_scan_area((unsigned long)ptr, size, gfp);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_SCAN_AREA, ptr, size, 0);
 }
 EXPORT_SYMBOL(kmemleak_scan_area);
 
@@ -1232,8 +1097,6 @@ void __ref kmemleak_no_scan(const void *ptr)
 
 	if (kmemleak_enabled && ptr && !IS_ERR(ptr))
 		object_no_scan((unsigned long)ptr);
-	else if (kmemleak_early_log)
-		log_early(KMEMLEAK_NO_SCAN, ptr, 0, 0);
 }
 EXPORT_SYMBOL(kmemleak_no_scan);
 
@@ -2020,7 +1883,6 @@ static void kmemleak_disable(void)
 
 	/* stop any memory operation tracing */
 	kmemleak_enabled = 0;
-	kmemleak_early_log = 0;
 
 	/* check whether it is too early for a kernel thread */
 	if (kmemleak_initialized)
@@ -2048,20 +1910,11 @@ static int __init kmemleak_boot_config(char *str)
 }
 early_param("kmemleak", kmemleak_boot_config);
 
-static void __init print_log_trace(struct early_log *log)
-{
-	pr_notice("Early log backtrace:\n");
-	stack_trace_print(log->trace, log->trace_len, 2);
-}
-
 /*
  * Kmemleak initialization.
  */
 void __init kmemleak_init(void)
 {
-	int i;
-	unsigned long flags;
-
 #ifdef CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF
 	if (!kmemleak_skip_disable) {
 		kmemleak_disable();
@@ -2069,28 +1922,15 @@ void __init kmemleak_init(void)
 	}
 #endif
 
+	if (kmemleak_error)
+		return;
+
 	jiffies_min_age = msecs_to_jiffies(MSECS_MIN_AGE);
 	jiffies_scan_wait = msecs_to_jiffies(SECS_SCAN_WAIT * 1000);
 
 	object_cache = KMEM_CACHE(kmemleak_object, SLAB_NOLEAKTRACE);
 	scan_area_cache = KMEM_CACHE(kmemleak_scan_area, SLAB_NOLEAKTRACE);
 
-	if (crt_early_log > ARRAY_SIZE(early_log))
-		pr_warn("Early log buffer exceeded (%d), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE\n",
-			crt_early_log);
-
-	/* the kernel is still in UP mode, so disabling the IRQs is enough */
-	local_irq_save(flags);
-	kmemleak_early_log = 0;
-	if (kmemleak_error) {
-		local_irq_restore(flags);
-		return;
-	} else {
-		kmemleak_enabled = 1;
-		kmemleak_free_enabled = 1;
-	}
-	local_irq_restore(flags);
-
 	/* register the data/bss sections */
 	create_object((unsigned long)_sdata, _edata - _sdata,
 		      KMEMLEAK_GREY, GFP_ATOMIC);
@@ -2101,57 +1941,6 @@ void __init kmemleak_init(void)
 		create_object((unsigned long)__start_ro_after_init,
 			      __end_ro_after_init - __start_ro_after_init,
 			      KMEMLEAK_GREY, GFP_ATOMIC);
-
-	/*
-	 * This is the point where tracking allocations is safe. Automatic
-	 * scanning is started during the late initcall. Add the early logged
-	 * callbacks to the kmemleak infrastructure.
-	 */
-	for (i = 0; i < crt_early_log; i++) {
-		struct early_log *log = &early_log[i];
-
-		switch (log->op_type) {
-		case KMEMLEAK_ALLOC:
-			early_alloc(log);
-			break;
-		case KMEMLEAK_ALLOC_PERCPU:
-			early_alloc_percpu(log);
-			break;
-		case KMEMLEAK_FREE:
-			kmemleak_free(log->ptr);
-			break;
-		case KMEMLEAK_FREE_PART:
-			kmemleak_free_part(log->ptr, log->size);
-			break;
-		case KMEMLEAK_FREE_PERCPU:
-			kmemleak_free_percpu(log->ptr);
-			break;
-		case KMEMLEAK_NOT_LEAK:
-			kmemleak_not_leak(log->ptr);
-			break;
-		case KMEMLEAK_IGNORE:
-			kmemleak_ignore(log->ptr);
-			break;
-		case KMEMLEAK_SCAN_AREA:
-			kmemleak_scan_area(log->ptr, log->size, GFP_KERNEL);
-			break;
-		case KMEMLEAK_NO_SCAN:
-			kmemleak_no_scan(log->ptr);
-			break;
-		case KMEMLEAK_SET_EXCESS_REF:
-			object_set_excess_ref((unsigned long)log->ptr,
-					      log->excess_ref);
-			break;
-		default:
-			kmemleak_warn("Unknown early log operation: %d\n",
-				      log->op_type);
-		}
-
-		if (kmemleak_warning) {
-			print_log_trace(log);
-			kmemleak_warning = 0;
-		}
-	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations
  2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
                   ` (2 preceding siblings ...)
  2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
@ 2019-08-12 21:07 ` Andrew Morton
  2019-08-13  9:40   ` Catalin Marinas
       [not found] ` <08847a90-c37b-890f-8d0e-3ae1c3a1dd71@mellanox.com>
  4 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2019-08-12 21:07 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, linux-kernel, Michal Hocko, Matthew Wilcox, Qian Cai

On Mon, 12 Aug 2019 17:06:39 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:

> Following the discussions on v2 of this patch(set) [1], this series
> takes slightly different approach:
> 
> - it implements its own simple memory pool that does not rely on the
>   slab allocator
> 
> - drops the early log buffer logic entirely since it can now allocate
>   metadata from the memory pool directly before kmemleak is fully
>   initialised
> 
> - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
>   CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
> 
> - moves the kmemleak_init() call earlier (mm_init())
> 
> - to avoid a separate memory pool for struct scan_area, it makes the
>   tool robust when such allocations fail as scan areas are rather an
>   optimisation
> 
> [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com

Using the term "memory pool" is a little unfortunate, but better than
using "mempool"!

The changelog doesn't answer the very first question: why not use
mempools.  Please send along a paragraph which explains this decision.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations
  2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
@ 2019-08-13  9:35   ` Catalin Marinas
  2019-08-13 12:35   ` Qian Cai
  1 sibling, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-13  9:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

On Mon, Aug 12, 2019 at 05:06:42PM +0100, Catalin Marinas wrote:
> @@ -466,9 +419,13 @@ static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
>  	struct kmemleak_object *object;
>  
>  	/* try the slab allocator first */
> -	object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> -	if (object)
> -		return object;
> +	if (object_cache) {
> +		object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> +		if (object)
> +			return object;
> +		else
> +			WARN_ON_ONCE(1);

Oops, this was actually my debug warning just to make sure it triggered
(tested with failslab). The WARN_ON_ONCE(1) should be removed (I changed
it locally in case I post an update).

I noticed it in Andrew's subsequent checkpatch fix.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects
  2019-08-12 16:06 ` [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects Catalin Marinas
@ 2019-08-13  9:37   ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-13  9:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai

On Mon, Aug 12, 2019 at 05:06:41PM +0100, Catalin Marinas wrote:
> Add a memory pool for struct kmemleak_object in case the normal
> kmem_cache_alloc() fails under the gfp constraints passed by the caller.
> The mem_pool[] array size is currently fixed at 16000.

Following Andrew's comment, I'd add this paragraph here:

-----------8<------------------------
We are not using the existing mempool kernel API since this requires the
slab allocator to be available (for pool->elements allocation). A
subsequent kmemleak patch will replace the static early log buffer with
the pool allocation introduced here and this functionality is required
to be available before the slab was initialised.
-----------8<------------------------

(patch updated locally)

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations
  2019-08-12 21:07 ` [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Andrew Morton
@ 2019-08-13  9:40   ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-13  9:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Michal Hocko, Matthew Wilcox, Qian Cai

On Mon, Aug 12, 2019 at 02:07:30PM -0700, Andrew Morton wrote:
> On Mon, 12 Aug 2019 17:06:39 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > Following the discussions on v2 of this patch(set) [1], this series
> > takes slightly different approach:
> > 
> > - it implements its own simple memory pool that does not rely on the
> >   slab allocator
> > 
> > - drops the early log buffer logic entirely since it can now allocate
> >   metadata from the memory pool directly before kmemleak is fully
> >   initialised
> > 
> > - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
> >   CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
> > 
> > - moves the kmemleak_init() call earlier (mm_init())
> > 
> > - to avoid a separate memory pool for struct scan_area, it makes the
> >   tool robust when such allocations fail as scan areas are rather an
> >   optimisation
> > 
> > [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com
> 
> Using the term "memory pool" is a little unfortunate, but better than
> using "mempool"!

I agree, it could have been more inspired. What about "metadata pool"
(together with function name updates etc.)? Happy to send a v4.

> The changelog doesn't answer the very first question: why not use
> mempools.  Please send along a paragraph which explains this decision.

I posted one in reply to the patch where the changelog should be
updated.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations
  2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
  2019-08-13  9:35   ` Catalin Marinas
@ 2019-08-13 12:35   ` Qian Cai
  2019-08-13 13:49     ` Catalin Marinas
  1 sibling, 1 reply; 17+ messages in thread
From: Qian Cai @ 2019-08-13 12:35 UTC (permalink / raw)
  To: Catalin Marinas, linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox

On Mon, 2019-08-12 at 17:06 +0100, Catalin Marinas wrote:
> Currently kmemleak uses a static early_log buffer to trace all memory
> allocation/freeing before the slab allocator is initialised. Such early
> log is replayed during kmemleak_init() to properly initialise the
> kmemleak metadata for objects allocated up that point. With a memory
> pool that does not rely on the slab allocator, it is possible to skip
> this early log entirely.
> 
> In order to remove the early logging, consider kmemleak_enabled == 1 by
> default while the kmem_cache availability is checked directly on the
> object_cache and scan_area_cache variables. The RCU callback is only
> invoked after object_cache has been initialised as we wouldn't have any
> concurrent list traversal before this.
> 
> In order to reduce the number of callbacks before kmemleak is fully
> initialised, move the kmemleak_init() call to mm_init().
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  init/main.c       |   2 +-
>  lib/Kconfig.debug |  11 +-
>  mm/kmemleak.c     | 267 +++++-----------------------------------------
>  3 files changed, 35 insertions(+), 245 deletions(-)
> 
> diff --git a/init/main.c b/init/main.c
> index 96f8d5af52d6..ca05e3cd7ef7 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -556,6 +556,7 @@ static void __init mm_init(void)
>  	report_meminit();
>  	mem_init();
>  	kmem_cache_init();
> +	kmemleak_init();
>  	pgtable_init();
>  	debug_objects_mem_init();
>  	vmalloc_init();
> @@ -740,7 +741,6 @@ asmlinkage __visible void __init start_kernel(void)
>  		initrd_start = 0;
>  	}
>  #endif
> -	kmemleak_init();
>  	setup_per_cpu_pageset();
>  	numa_policy_init();
>  	acpi_early_init();
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 4d39540011e2..39df06ffd9f4 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -592,17 +592,18 @@ config DEBUG_KMEMLEAK
>  	  In order to access the kmemleak file, debugfs needs to be
>  	  mounted (usually at /sys/kernel/debug).
>  
> -config DEBUG_KMEMLEAK_EARLY_LOG_SIZE
> -	int "Maximum kmemleak early log entries"
> +config DEBUG_KMEMLEAK_MEM_POOL_SIZE
> +	int "Kmemleak memory pool size"
>  	depends on DEBUG_KMEMLEAK
>  	range 200 40000
>  	default 16000

Hmm, this seems way too small. My previous round of testing with
kmemleak.mempool=524288 works quite well on all architectures.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations
  2019-08-13 12:35   ` Qian Cai
@ 2019-08-13 13:49     ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-08-13 13:49 UTC (permalink / raw)
  To: Qian Cai
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox

On Tue, Aug 13, 2019 at 08:35:54AM -0400, Qian Cai wrote:
> On Mon, 2019-08-12 at 17:06 +0100, Catalin Marinas wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 4d39540011e2..39df06ffd9f4 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -592,17 +592,18 @@ config DEBUG_KMEMLEAK
> >  	  In order to access the kmemleak file, debugfs needs to be
> >  	  mounted (usually at /sys/kernel/debug).
> >  
> > -config DEBUG_KMEMLEAK_EARLY_LOG_SIZE
> > -	int "Maximum kmemleak early log entries"
> > +config DEBUG_KMEMLEAK_MEM_POOL_SIZE
> > +	int "Kmemleak memory pool size"
> >  	depends on DEBUG_KMEMLEAK
> >  	range 200 40000
> >  	default 16000
> 
> Hmm, this seems way too small. My previous round of testing with
> kmemleak.mempool=524288 works quite well on all architectures.

We can change the upper bound here to 1M but I'd keep the default sane.
Not everyone is running tests under OOM.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
@ 2019-10-03  6:13   ` Alexey Kardashevskiy
  2019-10-03  8:36     ` Qian Cai
  2019-10-03  8:41     ` Catalin Marinas
  0 siblings, 2 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-10-03  6:13 UTC (permalink / raw)
  To: Catalin Marinas, linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Matthew Wilcox, Qian Cai



On 13/08/2019 02:06, Catalin Marinas wrote:
> Object scan areas are an optimisation aimed to decrease the false
> positives and slightly improve the scanning time of large objects known
> to only have a few specific pointers. If a struct scan_area fails to
> allocate, kmemleak can still function normally by scanning the full
> object.
> 
> Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> scan_area allocation fails.


I came across this one while bisecting sudden drop in throughput of a 100Gbit Mellanox CX4 ethernet card in a PPC POWER9
system, the speed dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177, this are the relevant config
options:

[fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y

Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is what KMEMLEAK_EARLY_LOG_SIZE is now in the master)
produces soft lockups on the recent upstream (sha1 a3c0e7b1fe1f):

[c000001fde64fb60] [c000000000c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
[c000001fde64fb90] [c0000000004117e4] find_and_remove_object+0xa4/0xd0
[c000001fde64fbe0] [c000000000411c74] delete_object_full+0x24/0x50
[c000001fde64fc00] [c000000000411d28] __kmemleak_do_cleanup+0x88/0xd0
[c000001fde64fc40] [c00000000012a1a4] process_one_work+0x374/0x6a0
[c000001fde64fd20] [c00000000012a548] worker_thread+0x78/0x5a0
[c000001fde64fdb0] [c000000000135508] kthread+0x198/0x1a0
[c000001fde64fe20] [c00000000000b980] ret_from_kernel_thread+0x5c/0x7c

KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.

Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked and I saw my 100Gbit. Disabling KMEMLEAK also
fixes the speed (apparently).

Is that something expected? Thanks,



> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  mm/kmemleak.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index f6e602918dac..5ba7fad00fda 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -168,6 +168,8 @@ struct kmemleak_object {
>  #define OBJECT_REPORTED		(1 << 1)
>  /* flag set to not scan the object */
>  #define OBJECT_NO_SCAN		(1 << 2)
> +/* flag set to fully scan the object when scan_area allocation failed */
> +#define OBJECT_FULL_SCAN	(1 << 3)
>  
>  #define HEX_PREFIX		"    "
>  /* number of bytes to print per line; must be 16 or 32 */
> @@ -773,12 +775,14 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  	}
>  
>  	area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
> -	if (!area) {
> -		pr_warn("Cannot allocate a scan area\n");
> -		goto out;
> -	}
>  
>  	spin_lock_irqsave(&object->lock, flags);
> +	if (!area) {
> +		pr_warn_once("Cannot allocate a scan area, scanning the full object\n");
> +		/* mark the object for full scan to avoid false positives */
> +		object->flags |= OBJECT_FULL_SCAN;
> +		goto out_unlock;
> +	}
>  	if (size == SIZE_MAX) {
>  		size = object->pointer + object->size - ptr;
>  	} else if (ptr + size > object->pointer + object->size) {
> @@ -795,7 +799,6 @@ static void add_scan_area(unsigned long ptr, size_t size, gfp_t gfp)
>  	hlist_add_head(&area->node, &object->area_list);
>  out_unlock:
>  	spin_unlock_irqrestore(&object->lock, flags);
> -out:
>  	put_object(object);
>  }
>  
> @@ -1408,7 +1411,8 @@ static void scan_object(struct kmemleak_object *object)
>  	if (!(object->flags & OBJECT_ALLOCATED))
>  		/* already freed object */
>  		goto out;
> -	if (hlist_empty(&object->area_list)) {
> +	if (hlist_empty(&object->area_list) ||
> +	    object->flags & OBJECT_FULL_SCAN) {
>  		void *start = (void *)object->pointer;
>  		void *end = (void *)(object->pointer + object->size);
>  		void *next;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-10-03  6:13   ` Alexey Kardashevskiy
@ 2019-10-03  8:36     ` Qian Cai
  2019-10-03  8:41     ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Qian Cai @ 2019-10-03  8:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Catalin Marinas, linux-mm, linux-kernel, Andrew Morton,
	Michal Hocko, Matthew Wilcox



> On Oct 3, 2019, at 2:13 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> I came across this one while bisecting sudden drop in throughput of a 100Gbit Mellanox CX4 ethernet card in a PPC POWER9
> system, the speed dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177, this are the relevant config
> options:
> 
> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> 
> Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is what KMEMLEAK_EARLY_LOG_SIZE is now in the master)
> produces soft lockups on the recent upstream (sha1 a3c0e7b1fe1f):
> 
> [c000001fde64fb60] [c000000000c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
> [c000001fde64fb90] [c0000000004117e4] find_and_remove_object+0xa4/0xd0
> [c000001fde64fbe0] [c000000000411c74] delete_object_full+0x24/0x50
> [c000001fde64fc00] [c000000000411d28] __kmemleak_do_cleanup+0x88/0xd0
> [c000001fde64fc40] [c00000000012a1a4] process_one_work+0x374/0x6a0
> [c000001fde64fd20] [c00000000012a548] worker_thread+0x78/0x5a0
> [c000001fde64fdb0] [c000000000135508] kthread+0x198/0x1a0
> [c000001fde64fe20] [c00000000000b980] ret_from_kernel_thread+0x5c/0x7c
> 
> KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.
> 
> Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked and I saw my 100Gbit. Disabling KMEMLEAK also
> fixes the speed (apparently).
> 
> Is that something expected? Thanks,

It is expected that a debug option like this will make the system slower.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-10-03  6:13   ` Alexey Kardashevskiy
  2019-10-03  8:36     ` Qian Cai
@ 2019-10-03  8:41     ` Catalin Marinas
  2019-10-05  3:08       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2019-10-03  8:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko,
	Matthew Wilcox, Qian Cai

On Thu, Oct 03, 2019 at 04:13:07PM +1000, Alexey Kardashevskiy wrote:
> On 13/08/2019 02:06, Catalin Marinas wrote:
> > Object scan areas are an optimisation aimed to decrease the false
> > positives and slightly improve the scanning time of large objects known
> > to only have a few specific pointers. If a struct scan_area fails to
> > allocate, kmemleak can still function normally by scanning the full
> > object.
> > 
> > Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> > scan_area allocation fails.
> 
> I came across this one while bisecting sudden drop in throughput of a
> 100Gbit Mellanox CX4 ethernet card in a PPC POWER9 system, the speed
> dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177,
> this are the relevant config options:
> 
> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y

The throughput drop is probably caused caused by kmemleak slowing down
all memory allocations (including skb). So that's expected. You may get
similar drop with other debug options like lock proving, kasan.

> Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is
> what KMEMLEAK_EARLY_LOG_SIZE is now in the master) produces soft
> lockups on the recent upstream (sha1 a3c0e7b1fe1f):
> 
> [c000001fde64fb60] [c000000000c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
> [c000001fde64fb90] [c0000000004117e4] find_and_remove_object+0xa4/0xd0
> [c000001fde64fbe0] [c000000000411c74] delete_object_full+0x24/0x50
> [c000001fde64fc00] [c000000000411d28] __kmemleak_do_cleanup+0x88/0xd0
> [c000001fde64fc40] [c00000000012a1a4] process_one_work+0x374/0x6a0
> [c000001fde64fd20] [c00000000012a548] worker_thread+0x78/0x5a0
> [c000001fde64fdb0] [c000000000135508] kthread+0x198/0x1a0
> [c000001fde64fe20] [c00000000000b980] ret_from_kernel_thread+0x5c/0x7c

That's the kmemleak disabling path. I don't have the full log but I
suspect by setting a small pool size, kmemleak failed to allocate memory
and went into disabling itself. The clean-up above tries to remove the
allocated metadata. It seems that it takes significant time on your
platform. Not sure how to avoid the soft lock-up but I wouldn't bother
too much about it, it's only triggered by a previous error condition
disabling kmemleak.

> KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.
> 
> Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked
> and I saw my 100Gbit. Disabling KMEMLEAK also fixes the speed
> (apparently).

A small memory pool for kmemleak just disables it shortly after boot, so
it's no longer in the way and you get your throughput back.

> Is that something expected? Thanks,

Yes for the throughput. Not sure about the soft lock-up. Do you have the
full log around the lock-up?

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-10-03  8:41     ` Catalin Marinas
@ 2019-10-05  3:08       ` Alexey Kardashevskiy
  2019-10-07  9:56         ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2019-10-05  3:08 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko,
	Matthew Wilcox, Qian Cai



On 03/10/2019 18:41, Catalin Marinas wrote:
> On Thu, Oct 03, 2019 at 04:13:07PM +1000, Alexey Kardashevskiy wrote:
>> On 13/08/2019 02:06, Catalin Marinas wrote:
>>> Object scan areas are an optimisation aimed to decrease the false
>>> positives and slightly improve the scanning time of large objects known
>>> to only have a few specific pointers. If a struct scan_area fails to
>>> allocate, kmemleak can still function normally by scanning the full
>>> object.
>>>
>>> Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
>>> scan_area allocation fails.
>>
>> I came across this one while bisecting sudden drop in throughput of a
>> 100Gbit Mellanox CX4 ethernet card in a PPC POWER9 system, the speed
>> dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177,
>> this are the relevant config options:
>>
>> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
>> CONFIG_HAVE_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK=y
>> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
>> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
>> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
>> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> 
> The throughput drop is probably caused caused by kmemleak slowing down
> all memory allocations (including skb). So that's expected. You may get
> similar drop with other debug options like lock proving, kasan.

I was not precise. I meant that before dba82d943177 kmemleak would work but would not slow network down (at least
100Gbit) and now it does which is downgrade so I was wondering if kmemleak just got so much better to justify this
change or there is a bug somewhere, so which one is it? Or "LOG_SIZE=400" never really worked? See my findings below though.

If it was always slow, it is expected indeed.

> 
>> Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is
>> what KMEMLEAK_EARLY_LOG_SIZE is now in the master) produces soft
>> lockups on the recent upstream (sha1 a3c0e7b1fe1f):
>>
>> [c000001fde64fb60] [c000000000c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
>> [c000001fde64fb90] [c0000000004117e4] find_and_remove_object+0xa4/0xd0
>> [c000001fde64fbe0] [c000000000411c74] delete_object_full+0x24/0x50
>> [c000001fde64fc00] [c000000000411d28] __kmemleak_do_cleanup+0x88/0xd0
>> [c000001fde64fc40] [c00000000012a1a4] process_one_work+0x374/0x6a0
>> [c000001fde64fd20] [c00000000012a548] worker_thread+0x78/0x5a0
>> [c000001fde64fdb0] [c000000000135508] kthread+0x198/0x1a0
>> [c000001fde64fe20] [c00000000000b980] ret_from_kernel_thread+0x5c/0x7c
> 
> That's the kmemleak disabling path. I don't have the full log but I
> suspect by setting a small pool size, kmemleak failed to allocate memory
> and went into disabling itself. The clean-up above tries to remove the
> allocated metadata. It seems that it takes significant time on your
> platform. Not sure how to avoid the soft lock-up but I wouldn't bother
> too much about it, it's only triggered by a previous error condition
> disabling kmemleak.
> 
>> KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.
>>
>> Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked
>> and I saw my 100Gbit. Disabling KMEMLEAK also fixes the speed
>> (apparently).
> 
> A small memory pool for kmemleak just disables it shortly after boot, so
> it's no longer in the way and you get your throughput back.
> 
>> Is that something expected? Thanks,
> 
> Yes for the throughput. Not sure about the soft lock-up. Do you have the
> full log around the lock-up?

I was going to post one but then I received "kmemleak: Do not corrupt the object_list during clean-up" which fixed
lockups and took throughput back to normal, I'll reply there too. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
  2019-10-05  3:08       ` Alexey Kardashevskiy
@ 2019-10-07  9:56         ` Catalin Marinas
  0 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2019-10-07  9:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko,
	Matthew Wilcox, Qian Cai

On Sat, Oct 05, 2019 at 01:08:43PM +1000, Alexey Kardashevskiy wrote:
> On 03/10/2019 18:41, Catalin Marinas wrote:
> > On Thu, Oct 03, 2019 at 04:13:07PM +1000, Alexey Kardashevskiy wrote:
> >> On 13/08/2019 02:06, Catalin Marinas wrote:
> >>> Object scan areas are an optimisation aimed to decrease the false
> >>> positives and slightly improve the scanning time of large objects known
> >>> to only have a few specific pointers. If a struct scan_area fails to
> >>> allocate, kmemleak can still function normally by scanning the full
> >>> object.
> >>>
> >>> Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> >>> scan_area allocation fails.
> >>
> >> I came across this one while bisecting sudden drop in throughput of a
> >> 100Gbit Mellanox CX4 ethernet card in a PPC POWER9 system, the speed
> >> dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177,
> >> this are the relevant config options:
> >>
> >> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
> >> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> >> CONFIG_DEBUG_KMEMLEAK=y
> >> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
> >> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> >> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> >> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> > 
> > The throughput drop is probably caused caused by kmemleak slowing down
> > all memory allocations (including skb). So that's expected. You may get
> > similar drop with other debug options like lock proving, kasan.
> 
> I was not precise. I meant that before dba82d943177 kmemleak would
> work but would not slow network down (at least 100Gbit) and now it
> does which is downgrade so I was wondering if kmemleak just got so
> much better to justify this change or there is a bug somewhere, so
> which one is it? Or "LOG_SIZE=400" never really worked?

I suspect LOG_SIZE=400 never worked on your system (you can check the
log for any kmemleak disabled messages).

Thanks for testing the patch.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations
       [not found] ` <08847a90-c37b-890f-8d0e-3ae1c3a1dd71@mellanox.com>
@ 2019-12-03 16:08   ` Catalin Marinas
  2019-12-05 16:16     ` Noam Stolero
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2019-12-03 16:08 UTC (permalink / raw)
  To: Noam Stolero
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko,
	Matthew Wilcox, Qian Cai, Tal Gilboa, Tariq Toukan,
	Saeed Mahameed, Amir Ancel, Matan Nir, Bar Tuaf, brouer,
	edumazet, netdev

On Tue, Dec 03, 2019 at 03:51:50PM +0000, Noam Stolero wrote:
> On 8/12/2019 7:06 PM, Catalin Marinas wrote:
> >     Following the discussions on v2 of this patch(set) [1], this series
> >     takes slightly different approach:
> > 
> >     - it implements its own simple memory pool that does not rely on the
> >       slab allocator
> > 
> >     - drops the early log buffer logic entirely since it can now allocate
> >       metadata from the memory pool directly before kmemleak is fully
> >       initialised
> > 
> >     - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
> >       CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
> > 
> >     - moves the kmemleak_init() call earlier (mm_init())
> > 
> >     - to avoid a separate memory pool for struct scan_area, it makes the
> >       tool robust when such allocations fail as scan areas are rather an
> >       optimisation
> > 
> >     [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com
> > 
> >     Catalin Marinas (3):
> >       mm: kmemleak: Make the tool tolerant to struct scan_area allocation
> >         failures
> >       mm: kmemleak: Simple memory allocation pool for kmemleak objects
> >       mm: kmemleak: Use the memory pool for early allocations
> > 
> >      init/main.c       |   2 +-
> >      lib/Kconfig.debug |  11 +-
> >      mm/kmemleak.c     | 325 ++++++++++++----------------------------------
> >      3 files changed, 91 insertions(+), 247 deletions(-)
> 
> We observe severe degradation in our network performance affecting all
> of our NICs. The degradation is directly linked to this patch.
> 
> What we run:
> Simple Iperf TCP loopback with 8 streams on ConnectX5-100GbE.
> Since it's a loopback test, traffic goes from the socket through the IP
> stack and back to the socket, without going through the NIC driver.

Something similar was reported before. Can you try commit 2abd839aa7e6
("kmemleak: Do not corrupt the object_list during clean-up") and see if
it fixes the problem for you? It was merged in 5.4-rc4.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations
  2019-12-03 16:08   ` Catalin Marinas
@ 2019-12-05 16:16     ` Noam Stolero
  0 siblings, 0 replies; 17+ messages in thread
From: Noam Stolero @ 2019-12-05 16:16 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-mm, linux-kernel, Andrew Morton, Michal Hocko,
	Matthew Wilcox, Qian Cai, Tal Gilboa, Tariq Toukan,
	Saeed Mahameed, Amir Ancel, Matan Nir, Bar Tuaf, brouer,
	edumazet, netdev, Moshe Shemesh

On 12/3/2019 6:08 PM, Catalin Marinas wrote:
> On Tue, Dec 03, 2019 at 03:51:50PM +0000, Noam Stolero wrote:
>> On 8/12/2019 7:06 PM, Catalin Marinas wrote:
>>>      Following the discussions on v2 of this patch(set) [1], this series
>>>      takes slightly different approach:
>>>
>>>      - it implements its own simple memory pool that does not rely on the
>>>        slab allocator
>>>
>>>      - drops the early log buffer logic entirely since it can now allocate
>>>        metadata from the memory pool directly before kmemleak is fully
>>>        initialised
>>>
>>>      - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
>>>        CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
>>>
>>>      - moves the kmemleak_init() call earlier (mm_init())
>>>
>>>      - to avoid a separate memory pool for struct scan_area, it makes the
>>>        tool robust when such allocations fail as scan areas are rather an
>>>        optimisation
>>>
>>>      [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com
>>>
>>>      Catalin Marinas (3):
>>>        mm: kmemleak: Make the tool tolerant to struct scan_area allocation
>>>          failures
>>>        mm: kmemleak: Simple memory allocation pool for kmemleak objects
>>>        mm: kmemleak: Use the memory pool for early allocations
>>>
>>>       init/main.c       |   2 +-
>>>       lib/Kconfig.debug |  11 +-
>>>       mm/kmemleak.c     | 325 ++++++++++++----------------------------------
>>>       3 files changed, 91 insertions(+), 247 deletions(-)
>> We observe severe degradation in our network performance affecting all
>> of our NICs. The degradation is directly linked to this patch.
>>
>> What we run:
>> Simple Iperf TCP loopback with 8 streams on ConnectX5-100GbE.
>> Since it's a loopback test, traffic goes from the socket through the IP
>> stack and back to the socket, without going through the NIC driver.
> Something similar was reported before. Can you try commit 2abd839aa7e6
> ("kmemleak: Do not corrupt the object_list during clean-up") and see if
> it fixes the problem for you? It was merged in 5.4-rc4.

I've tested this commit, as well as 5.4.0-rc6 and 5.4.0 GA versions. We 
don't see the issue I've mentioned.

Thank you for the quick response and the assistance.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-12-05 16:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
2019-10-03  6:13   ` Alexey Kardashevskiy
2019-10-03  8:36     ` Qian Cai
2019-10-03  8:41     ` Catalin Marinas
2019-10-05  3:08       ` Alexey Kardashevskiy
2019-10-07  9:56         ` Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects Catalin Marinas
2019-08-13  9:37   ` Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
2019-08-13  9:35   ` Catalin Marinas
2019-08-13 12:35   ` Qian Cai
2019-08-13 13:49     ` Catalin Marinas
2019-08-12 21:07 ` [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Andrew Morton
2019-08-13  9:40   ` Catalin Marinas
     [not found] ` <08847a90-c37b-890f-8d0e-3ae1c3a1dd71@mellanox.com>
2019-12-03 16:08   ` Catalin Marinas
2019-12-05 16:16     ` Noam Stolero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).