All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] Kernel memory leak detector
@ 2008-11-29 10:43 Catalin Marinas
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
                   ` (14 more replies)
  0 siblings, 15 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

Here's a new version of the kernel memory leak detector. Kmemleak can
also be found in a branch on this git tree:

git://linux-arm.org/linux-2.6.git kmemleak

Changes since the previous release:

- kmemleak uses the default allocator (slab, slob or slub) instead of
  its own (which was removed), making the code simpler
- automatic scanning via a kernel thread every 10 min. The first scan
  starts 1 min after booting up. This thread prints possible memory
  leaks the first time they are encountered. Reading the
  /sys/kernel/debug/memleak file is still available to manually trigger
  a scan and show all the possible memory leaks
- scanning interruption via signals is now possible for manual scans
- implemented the comments received so far

Still to do:

- run-time and boot-time configuration like task stacks scanning,
  disabling kmemleak, enabling/disabling the automatic scanning

Thanks for your comments.


Catalin Marinas (15):
      kmemleak: Add the corresponding MAINTAINERS entry
      kmemleak: Simple testing module for kmemleak
      kmemleak: Keep the __init functions after initialization
      kmemleak: Enable the building of the memory leak detector
      kmemleak: Remove some of the kmemleak false positives
      arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file
      x86: Provide _sdata in the vmlinux_*.lds.S files
      kmemleak: Add modules support
      kmemleak: Add memleak_alloc callback from alloc_large_system_hash
      kmemleak: Add the vmalloc memory allocation/freeing hooks
      kmemleak: Add the slub memory allocation/freeing hooks
      kmemleak: Add the slob memory allocation/freeing hooks
      kmemleak: Add the slab memory allocation/freeing hooks
      kmemleak: Add documentation on the memory leak detector
      kmemleak: Add the base support


 Documentation/kmemleak.txt       |  124 +++++
 MAINTAINERS                      |    6 
 arch/arm/kernel/vmlinux.lds.S    |    2 
 arch/x86/kernel/vmlinux_32.lds.S |    1 
 arch/x86/kernel/vmlinux_64.lds.S |    1 
 drivers/char/vt.c                |    7 
 include/linux/init.h             |    6 
 include/linux/memleak.h          |   87 +++
 include/linux/percpu.h           |    5 
 include/linux/slab.h             |    2 
 init/main.c                      |    4 
 kernel/module.c                  |   55 ++
 lib/Kconfig.debug                |   46 ++
 mm/Makefile                      |    2 
 mm/memleak-test.c                |  109 ++++
 mm/memleak.c                     | 1019 ++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c                  |    3 
 mm/slab.c                        |   16 +
 mm/slob.c                        |   15 -
 mm/slub.c                        |    5 
 mm/vmalloc.c                     |   29 +
 21 files changed, 1533 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/kmemleak.txt
 create mode 100644 include/linux/memleak.h
 create mode 100644 mm/memleak-test.c
 create mode 100644 mm/memleak.c

-- 
Catalin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/15] kmemleak: Add the base support
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-12-01  5:39   ` Yasunori Goto
                     ` (2 more replies)
  2008-11-29 10:43 ` [PATCH 02/15] kmemleak: Add documentation on the memory leak detector Catalin Marinas
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/memleak. Enabling this feature introduces an
overhead to memory allocations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/memleak.h |   87 ++++
 init/main.c             |    4 
 mm/memleak.c            | 1019 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1109 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/memleak.h
 create mode 100644 mm/memleak.c

diff --git a/include/linux/memleak.h b/include/linux/memleak.h
new file mode 100644
index 0000000..2756a05
--- /dev/null
+++ b/include/linux/memleak.h
@@ -0,0 +1,87 @@
+/*
+ * include/linux/memleak.h
+ *
+ * Copyright (C) 2008 ARM Limited
+ * Written by Catalin Marinas <catalin.marinas@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __MEMLEAK_H
+#define __MEMLEAK_H
+
+#ifdef CONFIG_DEBUG_MEMLEAK
+
+extern void memleak_init(void);
+extern void memleak_alloc(const void *ptr, size_t size, int ref_count);
+extern void memleak_free(const void *ptr);
+extern void memleak_padding(const void *ptr, unsigned long offset, size_t size);
+extern void memleak_not_leak(const void *ptr);
+extern void memleak_ignore(const void *ptr);
+extern void memleak_scan_area(const void *ptr, unsigned long offset, size_t length);
+
+static inline void memleak_alloc_recursive(const void *ptr, size_t size,
+					   int ref_count, unsigned long flags)
+{
+	if (!(flags & SLAB_NOLEAKTRACE))
+		memleak_alloc(ptr, size, ref_count);
+}
+
+static inline void memleak_free_recursive(const void *ptr, unsigned long flags)
+{
+	if (!(flags & SLAB_NOLEAKTRACE))
+		memleak_free(ptr);
+}
+
+static inline void memleak_erase(void **ptr)
+{
+	*ptr = NULL;
+}
+
+#else
+
+#define DECLARE_MEMLEAK_OFFSET(name, type, member)
+
+static inline void memleak_init(void)
+{
+}
+static inline void memleak_alloc(const void *ptr, size_t size, int ref_count)
+{
+}
+static inline void memleak_alloc_recursive(const void *ptr, size_t size,
+					   int ref_count, unsigned long flags)
+{
+}
+static inline void memleak_free(const void *ptr)
+{
+}
+static inline void memleak_free_recursive(const void *ptr, unsigned long flags)
+{
+}
+static inline void memleak_not_leak(const void *ptr)
+{
+}
+static inline void memleak_ignore(const void *ptr)
+{
+}
+static inline void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
+{
+}
+static inline void memleak_erase(void **ptr)
+{
+}
+
+#endif	/* CONFIG_DEBUG_MEMLEAK */
+
+#endif	/* __MEMLEAK_H */
diff --git a/init/main.c b/init/main.c
index 7e117a2..81cbbb7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -56,6 +56,7 @@
 #include <linux/debug_locks.h>
 #include <linux/debugobjects.h>
 #include <linux/lockdep.h>
+#include <linux/memleak.h>
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
@@ -653,6 +654,8 @@ asmlinkage void __init start_kernel(void)
 	enable_debug_pagealloc();
 	cpu_hotplug_init();
 	kmem_cache_init();
+	prio_tree_init();
+	memleak_init();
 	debug_objects_mem_init();
 	idr_init_cache();
 	setup_per_cpu_pageset();
@@ -662,7 +665,6 @@ asmlinkage void __init start_kernel(void)
 	calibrate_delay();
 	pidmap_init();
 	pgtable_cache_init();
-	prio_tree_init();
 	anon_vma_init();
 #ifdef CONFIG_X86
 	if (efi_enabled)
diff --git a/mm/memleak.c b/mm/memleak.c
new file mode 100644
index 0000000..1b09ca2
--- /dev/null
+++ b/mm/memleak.c
@@ -0,0 +1,1019 @@
+/*
+ * mm/memleak.c
+ *
+ * Copyright (C) 2008 ARM Limited
+ * Written by Catalin Marinas <catalin.marinas@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/sched.h>
+#include <linux/jiffies.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/prio_tree.h>
+#include <linux/gfp.h>
+#include <linux/kallsyms.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/cpumask.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/rcupdate.h>
+#include <linux/stacktrace.h>
+#include <linux/cache.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mmzone.h>
+#include <linux/slab.h>
+
+#include <asm/sections.h>
+#include <asm/processor.h>
+#include <asm/thread_info.h>
+#include <asm/atomic.h>
+
+#include <linux/memleak.h>
+
+/*
+ * Kmemleak configuration and common defines.
+ */
+#define MAX_TRACE		16	/* stack trace length */
+#define REPORTS_NR		100	/* maximum number of reported leaks */
+#define MSECS_MIN_AGE		1000	/* minimum object age for leak reporting */ 
+#undef SCAN_TASK_STACKS			/* scan the task kernel stacks */
+#undef REPORT_ORPHAN_FREEING		/* notify when freeing orphan objects */
+
+#define BYTES_PER_WORD		sizeof(void *)
+#define MSECS_SCAN_YIELD	10
+#define SECS_FIRST_SCAN		60
+#define SECS_SCAN_PERIOD	600
+
+/* scanning area inside a memory block */
+struct memleak_scan_area {
+	struct hlist_node node;
+	unsigned long offset;
+	size_t length;
+};
+
+/* the main allocation tracking object */
+struct memleak_object {
+	spinlock_t lock;
+	unsigned long flags;
+	struct list_head object_list;
+	struct list_head gray_list;
+	struct prio_tree_node tree_node;
+	struct rcu_head rcu;		/* used for object_list lockless traversal */
+	atomic_t use_count;		/* internal usage count */
+	unsigned long pointer;
+	size_t size;
+	int ref_count;			/* the minimum encounters of the pointer */
+	int count;			/* the ecounters of the pointer */
+	struct hlist_head area_list;	/* areas to be scanned (or empty for all) */
+	unsigned long trace[MAX_TRACE];
+	unsigned int trace_len;
+	unsigned long jiffies;		/* creation timestamp */
+	pid_t pid;			/* pid of the current task */
+	char comm[TASK_COMM_LEN];	/* executable name */
+};
+
+/* The list of all allocated objects */
+static LIST_HEAD(object_list);
+static DEFINE_SPINLOCK(object_list_lock);
+/* The list of the gray objects */
+static LIST_HEAD(gray_list);
+/* prio search tree for object boundaries */
+static struct prio_tree_root object_tree_root;
+static DEFINE_RWLOCK(object_tree_lock);
+
+/* allocation caches */
+static struct kmem_cache *object_cache;
+static struct kmem_cache *scan_area_cache;
+
+static atomic_t memleak_enabled = ATOMIC_INIT(0);
+
+/* minimum and maximum address that may be valid pointers */
+static unsigned long min_addr = ~0;
+static unsigned long max_addr;
+
+/* used for yielding the CPU to other tasks during scanning */
+static unsigned long next_scan_yield;
+static struct task_struct *scan_thread;
+static DEFINE_MUTEX(scan_mutex);
+static int reported_leaks;
+static int scan_should_stop;
+
+static unsigned long jiffies_scan_yield;
+static unsigned long jiffies_min_age;
+
+/*
+ * Early object allocation (before kmemleak is fully initialised).
+ */
+#define EARLY_LOG_NR		200
+
+enum {
+	MEMLEAK_ALLOC,
+	MEMLEAK_FREE,
+	MEMLEAK_NOT_LEAK,
+	MEMLEAK_IGNORE,
+	MEMLEAK_SCAN_AREA,
+};
+
+struct early_log {
+	int op_type;			/* kmemleak operation type */
+	const void *ptr;
+	size_t size;
+	int ref_count;
+	unsigned long offset;
+	size_t length;
+};
+
+static struct early_log __initdata early_log[EARLY_LOG_NR];
+static int __initdata crt_early_log;
+
+/* object flags */
+#define OBJECT_ALLOCATED	(1 << 0)
+#define OBJECT_REPORTED		(1 << 1)
+
+/*
+ * Object colors, encoded with count and ref_count:
+ *   - white - orphan object, i.e. not enough references to it (ref_count >= 1)
+ *   - gray  - referred at least once and therefore non-orphan (ref_count == 0)
+ *   - black - ignore; it doesn't contain references (text section) (ref_count == -1).
+ * Newly created objects don't have any color (object->count == -1) before
+ * the next memory scan when they become white.
+ */
+static inline int color_white(const struct memleak_object *object)
+{
+	return object->count != -1 && object->count < object->ref_count;
+}
+
+static inline int color_gray(const struct memleak_object *object)
+{
+	return object->ref_count != -1 && object->count >= object->ref_count;
+}
+
+static inline int color_black(const struct memleak_object *object)
+{
+	return object->ref_count == -1;
+}
+
+static inline int unreferenced_object(struct memleak_object *object)
+{
+	if (color_white(object) &&
+	    (object->flags & OBJECT_ALLOCATED) &&
+	    time_is_before_eq_jiffies(object->jiffies + jiffies_min_age))
+		return 1;
+	else
+		return 0;
+}
+
+#define print_seq(seq, x...)			\
+do {						\
+	if (seq)				\
+		seq_printf(seq, x);		\
+	else					\
+		pr_info(x);			\
+} while (0)
+
+static void print_unreferenced(struct seq_file *seq,
+			       struct memleak_object *object)
+{
+	char namebuf[KSYM_NAME_LEN + 1] = "";
+	char *modname;
+	unsigned long symsize;
+	unsigned long offset = 0;
+	int i;
+
+	print_seq(seq, "unreferenced object 0x%08lx (size %zu):\n",
+		  object->pointer, object->size);
+	print_seq(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
+		  object->comm, object->pid, object->jiffies);
+	print_seq(seq, "  backtrace:\n");
+
+	for (i = 0; i < object->trace_len; i++) {
+		unsigned long trace = object->trace[i];
+
+		kallsyms_lookup(trace, &symsize, &offset, &modname, namebuf);
+		print_seq(seq, "    [<%08lx>] %s\n", trace, namebuf);
+	}
+}
+
+static void dump_object_info(struct memleak_object *object)
+{
+	struct stack_trace trace;
+
+	trace.nr_entries = object->trace_len;
+	trace.entries = object->trace;
+
+	pr_notice("kmemleak: object 0x%08lx (size %zu):\n",
+		  object->tree_node.start, object->size);
+	pr_notice("  comm \"%s\", pid %d, jiffies %lu\n",
+		  object->comm, object->pid, object->jiffies);
+	pr_notice("  ref_count = %d\n", object->ref_count);
+	pr_notice("  count = %d\n", object->count);
+	pr_notice("  backtrace:\n");
+	print_stack_trace(&trace, 4);
+}
+
+static struct memleak_object *lookup_object(unsigned long ptr, int alias)
+{
+	struct prio_tree_node *node;
+	struct prio_tree_iter iter;
+	struct memleak_object *object;
+
+	prio_tree_iter_init(&iter, &object_tree_root, ptr, ptr);
+	node = prio_tree_next(&iter);
+	if (node) {
+		object = prio_tree_entry(node, struct memleak_object, tree_node);
+		if (!alias && object->pointer != ptr) {
+			pr_warning("kmemleak: found object by alias");
+			object = NULL;
+		}
+	} else
+		object = NULL;
+
+	return object;
+}
+
+/*
+ * Return 1 if successful or 0 otherwise.
+ */
+static inline int get_object(struct memleak_object *object)
+{
+	return atomic_inc_not_zero(&object->use_count);
+}
+
+static void free_object_rcu(struct rcu_head *rcu)
+{
+	struct hlist_node *elem, *tmp;
+	struct memleak_scan_area *area;
+	struct memleak_object *object =
+		container_of(rcu, struct memleak_object, rcu);
+
+	/* once use_count is 0, there is no code accessing the object */
+	hlist_for_each_entry_safe(area, elem, tmp, &object->area_list, node) {
+		hlist_del(elem);
+		kmem_cache_free(scan_area_cache, area);
+	}
+	kmem_cache_free(object_cache, object);
+}
+
+static void put_object(struct memleak_object *object)
+{
+	unsigned long flags;
+
+	if (!atomic_dec_and_test(&object->use_count))
+		return;
+
+	/* should only get here after delete_object was called */
+	BUG_ON(object->flags & OBJECT_ALLOCATED);
+
+	spin_lock_irqsave(&object_list_lock, flags);
+	/* the last reference to this object */
+	list_del_rcu(&object->object_list);
+	call_rcu(&object->rcu, free_object_rcu);
+	spin_unlock_irqrestore(&object_list_lock, flags);
+}
+
+static struct memleak_object *find_and_get_object(unsigned long ptr, int alias)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+
+	read_lock_irqsave(&object_tree_lock, flags);
+	object = lookup_object(ptr, alias);
+	if (object)
+		get_object(object);
+	read_unlock_irqrestore(&object_tree_lock, flags);
+
+	return object;
+}
+
+/*
+ * Insert a pointer into the pointer hash table.
+ */
+static inline void create_object(unsigned long ptr, size_t size, int ref_count)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+	struct prio_tree_node *node;
+	struct stack_trace trace;
+
+	object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
+	if (!object)
+		panic("kmemleak: cannot allocate a memleak_object structure\n");
+
+	INIT_LIST_HEAD(&object->object_list);
+	INIT_LIST_HEAD(&object->gray_list);
+	INIT_HLIST_HEAD(&object->area_list);
+	spin_lock_init(&object->lock);
+	atomic_set(&object->use_count, 1);
+	object->flags = OBJECT_ALLOCATED;
+	object->pointer = ptr;
+	object->size = size;
+	object->ref_count = ref_count;
+	object->count = -1;				/* black color initially */
+	object->jiffies = jiffies;
+	if (in_irq()) {
+		object->pid = 0;
+		strncpy(object->comm, "hardirq", TASK_COMM_LEN);
+	} else if (in_softirq()) {
+		object->pid = 0;
+		strncpy(object->comm, "softirq", TASK_COMM_LEN);
+	} else {
+		object->pid = current->pid;
+		strncpy(object->comm, current->comm, TASK_COMM_LEN);
+	}
+
+	trace.max_entries = MAX_TRACE;
+	trace.nr_entries = 0;
+	trace.entries = object->trace;
+	trace.skip = 1;
+	save_stack_trace(&trace);
+
+	object->trace_len = trace.nr_entries;
+
+	INIT_PRIO_TREE_NODE(&object->tree_node);
+	object->tree_node.start = ptr;
+	object->tree_node.last = ptr + size - 1;
+
+	if (ptr < min_addr)
+		min_addr = ptr;
+	if (ptr + size > max_addr)
+		max_addr = ptr + size;
+	/*
+	 * Update the boundaries before inserting the object in the
+	 * prio search tree.
+	 */
+	smp_mb();
+
+	write_lock_irqsave(&object_tree_lock, flags);
+	node = prio_tree_insert(&object_tree_root, &object->tree_node);
+	if (node != &object->tree_node) {
+		unsigned long flags;
+
+		pr_warning("kmemleak: existing pointer\n");
+		dump_stack();
+
+		object = lookup_object(ptr, 1);
+		spin_lock_irqsave(&object->lock, flags);
+		dump_object_info(object);
+		spin_unlock_irqrestore(&object->lock, flags);
+
+		panic("kmemleak: cannot insert 0x%lx into the object search tree\n",
+		      ptr);
+	}
+	write_unlock_irqrestore(&object_tree_lock, flags);
+
+	spin_lock_irqsave(&object_list_lock, flags);
+	list_add_tail_rcu(&object->object_list, &object_list);
+	spin_unlock_irqrestore(&object_list_lock, flags);
+}
+
+/*
+ * Remove a pointer from the pointer hash table.
+ */
+static inline void delete_object(unsigned long ptr)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+
+	write_lock_irqsave(&object_tree_lock, flags);
+	object = lookup_object(ptr, 0);
+	if (!object) {
+		pr_warning("kmemleak: freeing unknown object at 0x%08lx\n", ptr);
+		dump_stack();
+		write_unlock_irqrestore(&object_tree_lock, flags);
+		return;
+	}
+	prio_tree_remove(&object_tree_root, &object->tree_node);
+	write_unlock_irqrestore(&object_tree_lock, flags);
+
+	BUG_ON(!(object->flags & OBJECT_ALLOCATED));
+
+	spin_lock_irqsave(&object->lock, flags);
+	object->flags &= ~OBJECT_ALLOCATED;
+#ifdef REPORT_ORPHAN_FREEING
+	if (color_white(object)) {
+		pr_warning("kmemleak: freeing orphan object 0x%08lx\n", ptr);
+		dump_stack();
+		dump_object_info(object);
+	}
+#endif
+	object->pointer = 0;
+	spin_unlock_irqrestore(&object->lock, flags);
+
+	put_object(object);
+}
+
+/*
+ * Make a object permanently gray (false positive).
+ */
+static inline void make_gray_object(unsigned long ptr)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+
+	object = find_and_get_object(ptr, 0);
+	if (!object) {
+		dump_stack();
+		panic("kmemleak: graying unknown object at 0x%08lx\n", ptr);
+	}
+
+	spin_lock_irqsave(&object->lock, flags);
+	object->ref_count = 0;
+	spin_unlock_irqrestore(&object->lock, flags);
+	put_object(object);
+}
+
+/*
+ * Mark the object as black.
+ */
+static inline void make_black_object(unsigned long ptr)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+
+	object = find_and_get_object(ptr, 0);
+	if (!object) {
+		dump_stack();
+		panic("kmemleak: blacking unknown object at 0x%08lx\n", ptr);
+	}
+
+	spin_lock_irqsave(&object->lock, flags);
+	object->ref_count = -1;
+	spin_unlock_irqrestore(&object->lock, flags);
+	put_object(object);
+}
+
+/*
+ * Add a scanning area to the object.
+ */
+static inline void add_scan_area(unsigned long ptr, unsigned long offset, size_t length)
+{
+	unsigned long flags;
+	struct memleak_object *object;
+	struct memleak_scan_area *area;
+
+	object = find_and_get_object(ptr, 0);
+	if (!object) {
+		dump_stack();
+		panic("kmemleak: adding scan area to unknown object at 0x%08lx\n", ptr);
+	}
+
+	area = kmem_cache_alloc(scan_area_cache, GFP_ATOMIC);
+	if (!area)
+		panic("kmemleak: cannot allocate a scan area\n");
+
+	spin_lock_irqsave(&object->lock, flags);
+	if (offset + length > object->size) {
+		dump_stack();
+		dump_object_info(object);
+		panic("kmemleak: scan area larger than object 0x%08lx\n", ptr);
+	}
+
+	INIT_HLIST_NODE(&area->node);
+	area->offset = offset;
+	area->length = length;
+
+	hlist_add_head(&area->node, &object->area_list);
+	spin_unlock_irqrestore(&object->lock, flags);
+	put_object(object);
+}
+
+/*
+ * Log the early memleak_* calls.
+ */
+static void __init memleak_early_log(int op_type, const void *ptr, size_t size,
+				     int ref_count,
+				     unsigned long offset, size_t length)
+{
+	unsigned long flags;
+	struct early_log *log;
+
+	if (crt_early_log >= EARLY_LOG_NR)
+		panic("kmemleak: early log buffer exceeded\n");
+
+	local_irq_save(flags);
+	log = &early_log[crt_early_log];
+	log->op_type = op_type;
+	log->ptr = ptr;
+	log->size = size;
+	log->ref_count = ref_count;
+	log->offset = offset;
+	log->length = length;
+	crt_early_log++;
+	local_irq_restore(flags);
+}
+
+/*
+ * Allocation function hook.
+ */
+void memleak_alloc(const void *ptr, size_t size, int ref_count)
+{
+	pr_debug("%s(0x%p, %zu, %d)\n", __FUNCTION__, ptr, size, ref_count);
+
+	if (!atomic_read(&memleak_enabled)) {
+		memleak_early_log(MEMLEAK_ALLOC, ptr, size, ref_count, 0, 0);
+		return;
+	}
+	if (!ptr)
+		return;
+
+	create_object((unsigned long)ptr, size, ref_count);
+}
+EXPORT_SYMBOL_GPL(memleak_alloc);
+
+/*
+ * Freeing function hook.
+ */
+void memleak_free(const void *ptr)
+{
+	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+	if (!atomic_read(&memleak_enabled)) {
+		memleak_early_log(MEMLEAK_FREE, ptr, 0, 0, 0, 0);
+		return;
+	}
+	if (!ptr)
+		return;
+
+	delete_object((unsigned long)ptr);
+}
+EXPORT_SYMBOL_GPL(memleak_free);
+
+/*
+ * Mark a object as a false positive.
+ */
+void memleak_not_leak(const void *ptr)
+{
+	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+	if (!atomic_read(&memleak_enabled)) {
+		memleak_early_log(MEMLEAK_NOT_LEAK, ptr, 0, 0, 0, 0);
+		return;
+	}
+	if (!ptr)
+		return;
+
+	make_gray_object((unsigned long)ptr);
+}
+EXPORT_SYMBOL(memleak_not_leak);
+
+/*
+ * Ignore this memory object.
+ */
+void memleak_ignore(const void *ptr)
+{
+	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+	if (!atomic_read(&memleak_enabled)) {
+		memleak_early_log(MEMLEAK_IGNORE, ptr, 0, 0, 0, 0);
+		return;
+	}
+	if (!ptr)
+		return;
+
+	make_black_object((unsigned long)ptr);
+}
+EXPORT_SYMBOL(memleak_ignore);
+
+/*
+ * Add a scanning area to an object.
+ */
+void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
+{
+	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
+
+	if (!atomic_read(&memleak_enabled)) {
+		memleak_early_log(MEMLEAK_SCAN_AREA, ptr, 0, 0, offset, length);
+		return;
+	}
+	if (!ptr)
+		return;
+
+	add_scan_area((unsigned long)ptr, offset, length);
+}
+EXPORT_SYMBOL(memleak_scan_area);
+
+static inline void scan_yield(void)
+{
+	BUG_ON(in_atomic());
+
+	if (time_is_before_eq_jiffies(next_scan_yield)) {
+		schedule();
+		next_scan_yield = jiffies + jiffies_scan_yield;
+	}
+	if (signal_pending(current))
+		scan_should_stop = 1;
+}
+
+/*
+ * Scan a block of memory (exclusive range) for pointers and move
+ * those found to the gray list.
+ */
+static void scan_block(void *_start, void *_end, struct memleak_object *scanned)
+{
+	unsigned long *ptr;
+	unsigned long *start = PTR_ALIGN(_start, BYTES_PER_WORD);
+	unsigned long *end = _end - (BYTES_PER_WORD - 1);
+
+	for (ptr = start; ptr < end; ptr++) {
+		unsigned long flags;
+		unsigned long pointer = *ptr;
+		struct memleak_object *object;
+
+		if (scan_should_stop)
+			break;
+
+		/* If scanned, the CPU is yielded in the calling code */
+		if (!scanned)
+			scan_yield();
+
+		/*
+		 * The boundaries check doesn't need to be precise
+		 * (hence no locking) since orphan objects need to
+		 * pass a scanning threshold before being reported.
+		 */
+		if (pointer < min_addr || pointer >= max_addr)
+			continue;
+
+		object = find_and_get_object(pointer, 1);
+		if (!object)
+			continue;
+		if (object == scanned) {
+			/* self referenced */
+			put_object(object);
+			continue;
+		}
+
+		/*
+		 * Avoid the lockdep recursive warning on object->lock
+		 * being previously acquired in scan_object(). These
+		 * locks are enclosed by scan_mutex.
+		 */
+		spin_lock_irqsave_nested(&object->lock, flags, SINGLE_DEPTH_NESTING);
+		if (!color_white(object)) {
+			/* non-orphan, ignored or new */
+			spin_unlock_irqrestore(&object->lock, flags);
+			put_object(object);
+			continue;
+		}
+
+		object->count++;
+		if (color_gray(object))
+			/* the object became gray, add it to the list */
+			list_add_tail(&object->gray_list, &gray_list);
+		else
+			put_object(object);
+		spin_unlock_irqrestore(&object->lock, flags);
+	}
+}
+
+/*
+ * Scan a memory block represented by a memleak_object.
+ */
+static inline void scan_object(struct memleak_object *object)
+{
+	struct memleak_scan_area *area;
+	struct hlist_node *elem;
+	unsigned long flags;
+
+	spin_lock_irqsave(&object->lock, flags);
+
+	/* freed object */
+	if (!(object->flags & OBJECT_ALLOCATED))
+		goto out;
+
+	if (hlist_empty(&object->area_list))
+		scan_block((void *)object->pointer,
+			   (void *)(object->pointer + object->size), object);
+	else
+		hlist_for_each_entry(area, elem, &object->area_list, node)
+			scan_block((void *)(object->pointer + area->offset),
+				   (void *)(object->pointer + area->offset
+					    + area->length), object);
+
+ out:
+	spin_unlock_irqrestore(&object->lock, flags);
+}
+
+/*
+ * Scan the memory and print the orphan objects.
+ */
+static void memleak_scan(void)
+{
+	unsigned long flags;
+	struct memleak_object *object, *tmp;
+	int i;
+#ifdef SCAN_TASK_STACKS
+	struct task_struct *task;
+#endif
+
+	scan_should_stop = 0;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(object, &object_list, object_list) {
+		spin_lock_irqsave(&object->lock, flags);
+#ifdef DEBUG
+		/*
+		 * With a few exceptions there should be a maximum of
+		 * 1 reference to any object at this point.
+		 */
+		if (atomic_read(&object->use_count) > 1) {
+			pr_debug("kmemleak: object->use_count = %d\n",
+				 atomic_read(&object->use_count));
+			dump_object_info(object);
+		}
+#endif
+		/* reset the reference count (whiten the object) */
+		object->count = 0;
+		if (color_gray(object) && get_object(object))
+			list_add_tail(&object->gray_list, &gray_list);
+
+		spin_unlock_irqrestore(&object->lock, flags);
+	}
+	rcu_read_unlock();
+
+	/* data/bss scanning */
+	scan_block(_sdata, _edata, NULL);
+	scan_block(__bss_start, __bss_stop, NULL);
+
+#ifdef CONFIG_SMP
+	/* per-cpu scanning */
+	for_each_possible_cpu(i)
+		scan_block(__per_cpu_start + per_cpu_offset(i),
+			   __per_cpu_end + per_cpu_offset(i), NULL);
+#endif
+
+	/* mem_map scanning */
+	for_each_online_node(i) {
+		struct page *page, *end;
+
+		page = NODE_MEM_MAP(i);
+		end  = page + NODE_DATA(i)->node_spanned_pages;
+
+		scan_block(page, end, NULL);
+	}
+
+#ifdef SCAN_TASK_STACKS
+	read_lock(&tasklist_lock);
+	for_each_process(task)
+		scan_block(task_stack_page(task),
+			   task_stack_page(task) + THREAD_SIZE, NULL);
+	read_unlock(&tasklist_lock);
+#endif
+
+	/*
+	 * Scan the objects already referenced. More objects will be
+	 * referenced and, if there are no memory leaks, all the
+	 * objects will be scanned. The list traversal is safe for
+	 * both tail additions and removals from inside the loop. The
+	 * memleak objects cannot be freed from outside the loop
+	 * because their use_count was increased.
+	 */
+	object = list_entry(gray_list.next, typeof(*object), gray_list);
+	while (&object->gray_list != &gray_list) {
+		scan_yield();
+
+		/* may add new objects to the list */
+		if (!scan_should_stop)
+			scan_object(object);
+
+		tmp = list_entry(object->gray_list.next, typeof(*object),
+				 gray_list);
+
+		/* remove the object from the list and release it */
+		list_del(&object->gray_list);
+		put_object(object);
+
+		object = tmp;
+	}
+	BUG_ON(!list_empty(&gray_list));
+}
+
+static void *memleak_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct memleak_object *object;
+	loff_t n = *pos;
+
+	if (!n) {
+		memleak_scan();
+		reported_leaks = 0;
+	}
+	if (reported_leaks >= REPORTS_NR)
+		return NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(object, &object_list, object_list) {
+		if (n-- > 0)
+			continue;
+
+		if (get_object(object))
+			goto out;
+	}
+	object = NULL;
+ out:
+	rcu_read_unlock();
+	return object;
+}
+
+static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct list_head *n;
+	struct memleak_object *next = NULL;
+	unsigned long flags;
+
+	++(*pos);
+	if (reported_leaks >= REPORTS_NR)
+		goto out;
+
+	spin_lock_irqsave(&object_list_lock, flags);
+	n = ((struct memleak_object *)v)->object_list.next;
+	if (n != &object_list) {
+		next = list_entry(n, struct memleak_object, object_list);
+		get_object(next);
+	}
+	spin_unlock_irqrestore(&object_list_lock, flags);
+
+ out:
+	put_object(v);
+	return next;
+}
+
+static void memleak_seq_stop(struct seq_file *seq, void *v)
+{
+	if (v)
+		put_object(v);
+}
+
+static int memleak_seq_show(struct seq_file *seq, void *v)
+{
+	struct memleak_object *object = v;
+	unsigned long flags;
+
+	spin_lock_irqsave(&object->lock, flags);
+	if (!unreferenced_object(object))
+		goto out;
+	print_unreferenced(seq, object);
+	reported_leaks++;
+out:
+	spin_unlock_irqrestore(&object->lock, flags);
+	return 0;
+}
+
+static struct seq_operations memleak_seq_ops = {
+	.start = memleak_seq_start,
+	.next  = memleak_seq_next,
+	.stop  = memleak_seq_stop,
+	.show  = memleak_seq_show,
+};
+
+static int memleak_seq_open(struct inode *inode, struct file *file)
+{
+	int ret = mutex_lock_interruptible(&scan_mutex);
+	if (ret < 0)
+		return ret;
+	ret = seq_open(file, &memleak_seq_ops);
+	if (ret < 0)
+		mutex_unlock(&scan_mutex);
+	return ret;
+}
+
+static int memleak_seq_release(struct inode *inode, struct file *file)
+{
+	int ret = seq_release(inode, file);
+	mutex_unlock(&scan_mutex);
+	return ret;
+}
+
+static struct file_operations memleak_fops = {
+	.owner	 = THIS_MODULE,
+	.open    = memleak_seq_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = memleak_seq_release,
+};
+
+static int memleak_scan_thread(void *arg)
+{
+	/* sleep before the first scan */
+	ssleep(SECS_FIRST_SCAN);
+
+	while (!kthread_should_stop()) {
+		struct memleak_object *object;
+
+		mutex_lock(&scan_mutex);
+
+		memleak_scan();
+		reported_leaks = 0;
+
+		rcu_read_lock();
+		list_for_each_entry_rcu(object, &object_list, object_list) {
+			unsigned long flags;
+
+			if (reported_leaks >= REPORTS_NR)
+				break;
+			spin_lock_irqsave(&object->lock, flags);
+			if (!(object->flags & OBJECT_REPORTED) &&
+			    unreferenced_object(object)) {
+				print_unreferenced(NULL, object);
+				object->flags |= OBJECT_REPORTED;
+				reported_leaks++;
+			}
+			spin_unlock_irqrestore(&object->lock, flags);
+		}
+		rcu_read_unlock();
+
+		mutex_unlock(&scan_mutex);
+		ssleep(SECS_SCAN_PERIOD);
+	}
+
+	return 0;
+}
+
+/*
+ * Kmemleak initialization.
+ */
+void __init memleak_init(void)
+{
+	int i;
+
+	jiffies_scan_yield = msecs_to_jiffies(MSECS_SCAN_YIELD);
+	jiffies_min_age = msecs_to_jiffies(MSECS_MIN_AGE);
+	object_cache = kmem_cache_create("kmemleak_object",
+					 sizeof(struct memleak_object),
+					 0, SLAB_NOLEAKTRACE, NULL);
+	scan_area_cache = kmem_cache_create("kmemleak_scan_area",
+					    sizeof(struct memleak_scan_area),
+					    0, SLAB_NOLEAKTRACE, NULL);
+	INIT_PRIO_TREE_ROOT(&object_tree_root);
+
+	/*
+	 * This is the point where tracking allocations is safe.
+	 * Scanning is only available later.
+	 */
+	atomic_set(&memleak_enabled, 1);
+
+	/* execute the early logged operations */
+	for (i = 0; i < crt_early_log; i++) {
+		struct early_log *log = &early_log[i];
+
+		switch (log->op_type) {
+		case MEMLEAK_ALLOC:
+			memleak_alloc(log->ptr, log->size, log->ref_count);
+			break;
+		case MEMLEAK_FREE:
+			memleak_free(log->ptr);
+			break;
+		case MEMLEAK_NOT_LEAK:
+			memleak_not_leak(log->ptr);
+			break;
+		case MEMLEAK_IGNORE:
+			memleak_ignore(log->ptr);
+			break;
+		case MEMLEAK_SCAN_AREA:
+			memleak_scan_area(log->ptr, log->offset, log->length);
+			break;
+		default:
+			BUG();
+		}
+	}
+}
+
+/*
+ * Late initialization function.
+ */
+static int __init memleak_late_init(void)
+{
+	struct dentry *dentry;
+
+	dentry = debugfs_create_file("memleak", S_IRUGO, NULL, NULL,
+				     &memleak_fops);
+	if (!dentry)
+		return -ENOMEM;
+
+	scan_thread = kthread_run(memleak_scan_thread, NULL, "kmemleak");
+	if (IS_ERR(scan_thread))
+		pr_warning("kmemleak: Failed to initialise the scan thread\n");
+
+	pr_info("Kernel memory leak detector initialized\n");
+
+	return 0;
+}
+late_initcall(memleak_late_init);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/15] kmemleak: Add documentation on the memory leak detector
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:43 ` [PATCH 03/15] kmemleak: Add the slab memory allocation/freeing hooks Catalin Marinas
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds the Documentation/kmemleak.txt file with some
information about how kmemleak works.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kmemleak.txt |  124 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 124 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kmemleak.txt

diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
new file mode 100644
index 0000000..40ba6bc
--- /dev/null
+++ b/Documentation/kmemleak.txt
@@ -0,0 +1,124 @@
+Kernel Memory Leak Detector
+===========================
+
+Introduction
+------------
+
+Kmemleak provides a way of detecting possible kernel memory leaks in a
+way similar to a tracing garbage collector
+(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
+with the difference that the orphan objects are not freed but only
+reported via /sys/kernel/debug/memleak. A similar method is used by
+the Valgrind tool (memcheck --leak-check) to detect the memory leaks
+in user-space applications.
+
+Usage
+-----
+
+CONFIG_DEBUG_MEMLEAK in "Kernel hacking" has to be enabled. A kernel
+thread scans the memory every 10 min (by default) and prints any new
+unreferenced objects found. To trigger an intermediate scan and
+display all the possible memory leaks:
+
+  # mount -t debugfs nodev /sys/kernel/debug/
+  # cat /sys/kernel/debug/memleak
+
+Note that the orphan objects are listed in the order they were
+allocated and one object at the beginning of the list may cause other
+subsequent objects to be reported as orphan.
+
+Basic Algorithm
+---------------
+
+The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
+friends are tracked and the pointers, together with additional
+information like size and stack trace, are stored in a prio search
+tree. The corresponding freeing function calls are tracked and the
+pointers removed from the kmemleak data structures.
+
+An allocated block of memory is considered orphan if no pointer to its
+start address or to any location inside the block can be found by
+scanning the memory (including saved registers). This means that there
+might be no way for the kernel to pass the address of the allocated
+block to a freeing function and therefore the block is considered a
+memory leak.
+
+The scanning algorithm steps:
+
+  1. mark all objects as white (remaining white objects will later be
+     considered orphan)
+  2. scan the memory starting with the data section and stacks,
+     checking the values against the addresses stored in the hash
+     table. If a pointer to a white object is found, the object is
+     added to the grey list
+  3. scan the grey objects for matching addresses (some white objects
+     can become grey and added at the end of the grey list) until the
+     grey set is finished
+  4. the remaining white objects are considered orphan and reported
+     via /sys/kernel/debug/memleak
+
+Some allocated memory blocks have pointers stored in the kernel's
+internal data structures and they cannot be detected as orphans. To
+avoid this, kmemleak can also store the number of values pointing to
+an address inside the block address range that need to be found so
+that the block is not considered a leak. One example is __vmalloc().
+
+Limitations and Drawbacks
+-------------------------
+
+The biggest drawback is the reduced performance of memory allocation
+and freeing. To avoid other penalties, the memory scanning is only
+performed when the /sys/kernel/debug/memleak file is read. Anyway,
+this tool is intended for debugging purposes where the performance
+might not be the most important requirement.
+
+To keep the algorithm simple, kmemleak scans for values pointing to
+any address inside a block's address range. This may lead to an
+increased number of false negatives. However, it is likely that a
+realy memory leak will eventually become visible.
+
+Another source of false negatives is the data stored in non-pointer
+values. In a future version, kmemleak could only scan the pointer
+members in the allocated structures. This feature would solve many of
+the false negative cases described above.
+
+The tool can report false positives. These are cases where an
+allocated block doesn't need to be freed (some cases in the init_call
+functions), the pointer is calculated by other methods than the usual
+container_of macro or the pointer is stored in a location not scanned
+by kmemleak.
+
+Page allocations and ioremap are not tracked. Only the ARM and i386
+architectures are currently supported.
+
+Kmemleak API
+------------
+
+See the include/linux/memleak.h header for the functions prototype.
+
+memleak_init		- initialize kmemleak
+memleak_alloc		- notify of a memory block allocation
+memleak_free		- notify of a memory block freeing
+memleak_not_leak	- mark an object as not a leak
+memleak_ignore		- do not scan or report an object as leak
+memleak_scan_area	- add scan areas inside a memory block
+memleak_erase		- erase an old value in a pointer variable
+
+Dealing with false positives/negatives
+--------------------------------------
+
+To reduce the false negatives, kmemleak provides the memleak_ignore,
+memleak_scan_area and memleak_erase functions. The task stacks also
+increase the amount of false negatives and their scanning is not
+enabled by default.
+
+For objects known not to be leaks, kmemleak provides the
+memleak_not_leak function. The memleak_ignore could also be used if
+the memory block is known not to contain other pointers and it will no
+longer be scanned.
+
+Some of the reported leaks are only transient, especially on SMP
+systems, because of pointers temporarily stored in CPU registers or
+stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000)
+representing the minimum age of an object to be reported as a memory
+leak.


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/15] kmemleak: Add the slab memory allocation/freeing hooks
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
  2008-11-29 10:43 ` [PATCH 02/15] kmemleak: Add documentation on the memory leak detector Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:43 ` [PATCH 04/15] kmemleak: Add the slob " Catalin Marinas
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pekka Enberg, Ingo Molnar

This patch adds the callbacks to memleak_(alloc|free) functions from the
slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to avoid
recursive calls to kmemleak when it allocates its own data structures.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@gmail.com>
---
 include/linux/slab.h |    2 ++
 mm/slab.c            |   16 ++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 000da12..23adc83 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -62,6 +62,8 @@
 # define SLAB_DEBUG_OBJECTS	0x00000000UL
 #endif
 
+#define SLAB_NOLEAKTRACE	0x00800000UL	/* used internally by kmemleak */
+
 /* The following flags affect the page allocator grouping pages by mobility */
 #define SLAB_RECLAIM_ACCOUNT	0x00020000UL		/* Objects are reclaimable */
 #define SLAB_TEMPORARY		SLAB_RECLAIM_ACCOUNT	/* Objects are short-lived */
diff --git a/mm/slab.c b/mm/slab.c
index 0918751..762299e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -106,6 +106,7 @@
 #include	<linux/string.h>
 #include	<linux/uaccess.h>
 #include	<linux/nodemask.h>
+#include	<linux/memleak.h>
 #include	<linux/mempolicy.h>
 #include	<linux/mutex.h>
 #include	<linux/fault-inject.h>
@@ -177,13 +178,13 @@
 			 SLAB_STORE_USER | \
 			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \
-			 SLAB_DEBUG_OBJECTS)
+			 SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE)
 #else
 # define CREATE_MASK	(SLAB_HWCACHE_ALIGN | \
 			 SLAB_CACHE_DMA | \
 			 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 			 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \
-			 SLAB_DEBUG_OBJECTS)
+			 SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE)
 #endif
 
 /*
@@ -2610,6 +2611,12 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
 		/* Slab management obj is off-slab. */
 		slabp = kmem_cache_alloc_node(cachep->slabp_cache,
 					      local_flags & ~GFP_THISNODE, nodeid);
+		/* 
+		 * Only scan the list member to avoid false negatives
+		 * (especially caused by the s_mem pointer)
+		 */
+		memleak_scan_area(slabp, offsetof(struct slab, list),
+				  sizeof(struct list_head));
 		if (!slabp)
 			return NULL;
 	} else {
@@ -3195,6 +3202,8 @@ static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 		STATS_INC_ALLOCMISS(cachep);
 		objp = cache_alloc_refill(cachep, flags);
 	}
+	/* avoid false negatives */
+	memleak_erase(&ac->entry[ac->avail]);
 	return objp;
 }
 
@@ -3412,6 +3421,7 @@ __cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
   out:
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
+	memleak_alloc_recursive(ptr, obj_size(cachep), 1, cachep->flags);
 
 	if (unlikely((flags & __GFP_ZERO) && ptr))
 		memset(ptr, 0, obj_size(cachep));
@@ -3465,6 +3475,7 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
 	objp = __do_cache_alloc(cachep, flags);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
+	memleak_alloc_recursive(objp, obj_size(cachep), 1, cachep->flags);
 	prefetchw(objp);
 
 	if (unlikely((flags & __GFP_ZERO) && objp))
@@ -3580,6 +3591,7 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp)
 	struct array_cache *ac = cpu_cache_get(cachep);
 
 	check_irq_off();
+	memleak_free_recursive(objp, cachep->flags);
 	objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
 
 	/*


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/15] kmemleak: Add the slob memory allocation/freeing hooks
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (2 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 03/15] kmemleak: Add the slab memory allocation/freeing hooks Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-30 17:30   ` Matt Mackall
  2008-11-29 10:43 ` [PATCH 05/15] kmemleak: Add the slub " Catalin Marinas
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pekka Enberg, Ingo Molnar, Matt Mackall

This patch adds the callbacks to memleak_(alloc|free) functions from the
slob allocator.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
---
 mm/slob.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
index cb675d1..98f71af 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -60,6 +60,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/memleak.h>
 #include <linux/cache.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -463,6 +464,7 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
 {
 	unsigned int *m;
 	int align = max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+	void *ret;
 
 	if (size < PAGE_SIZE - align) {
 		if (!size)
@@ -472,18 +474,18 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
 		if (!m)
 			return NULL;
 		*m = size;
-		return (void *)m + align;
+		ret = (void *)m + align;
 	} else {
-		void *ret;
-
 		ret = slob_new_page(gfp | __GFP_COMP, get_order(size), node);
 		if (ret) {
 			struct page *page;
 			page = virt_to_page(ret);
 			page->private = size;
 		}
-		return ret;
 	}
+
+	memleak_alloc(ret, size, 1);
+	return ret;
 }
 EXPORT_SYMBOL(__kmalloc_node);
 
@@ -493,6 +495,7 @@ void kfree(const void *block)
 
 	if (unlikely(ZERO_OR_NULL_PTR(block)))
 		return;
+	memleak_free(block);
 
 	sp = (struct slob_page *)virt_to_page(block);
 	if (slob_page(sp)) {
@@ -555,12 +558,14 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 	} else if (flags & SLAB_PANIC)
 		panic("Cannot create slab cache %s\n", name);
 
+	memleak_alloc(c, sizeof(struct kmem_cache), 1);
 	return c;
 }
 EXPORT_SYMBOL(kmem_cache_create);
 
 void kmem_cache_destroy(struct kmem_cache *c)
 {
+	memleak_free(c);
 	slob_free(c, sizeof(struct kmem_cache));
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
@@ -577,6 +582,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 	if (c->ctor)
 		c->ctor(b);
 
+	memleak_alloc_recursive(b, c->size, 1, c->flags);
 	return b;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
@@ -599,6 +605,7 @@ static void kmem_rcu_free(struct rcu_head *head)
 
 void kmem_cache_free(struct kmem_cache *c, void *b)
 {
+	memleak_free_recursive(b, c->flags);
 	if (unlikely(c->flags & SLAB_DESTROY_BY_RCU)) {
 		struct slob_rcu *slob_rcu;
 		slob_rcu = b + (c->size - sizeof(struct slob_rcu));


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (3 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 04/15] kmemleak: Add the slob " Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 11:46   ` Pekka Enberg
  2008-11-29 10:43 ` [PATCH 06/15] kmemleak: Add the vmalloc " Catalin Marinas
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pekka Enberg, Ingo Molnar, Christoph Lameter

This patch adds the callbacks to memleak_(alloc|free) functions from the
slub allocator.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
---
 mm/slub.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7ad489a..b683571 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -18,6 +18,7 @@
 #include <linux/seq_file.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/memleak.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
 #include <linux/debugobjects.h>
@@ -140,7 +141,7 @@
  * Set of flags that will prevent slab merging
  */
 #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
-		SLAB_TRACE | SLAB_DESTROY_BY_RCU)
+		SLAB_TRACE | SLAB_DESTROY_BY_RCU | SLAB_NOLEAKTRACE)
 
 #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \
 		SLAB_CACHE_DMA)
@@ -1608,6 +1609,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	if (unlikely((gfpflags & __GFP_ZERO) && object))
 		memset(object, 0, objsize);
 
+	memleak_alloc_recursive(object, objsize, 1, s->flags);
 	return object;
 }
 
@@ -1710,6 +1712,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	unsigned long flags;
 
+	memleak_free_recursive(x, s->flags);
 	local_irq_save(flags);
 	c = get_cpu_slab(s, smp_processor_id());
 	debug_check_no_locks_freed(object, c->objsize);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/15] kmemleak: Add the vmalloc memory allocation/freeing hooks
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (4 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 05/15] kmemleak: Add the slub " Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:43 ` [PATCH 07/15] kmemleak: Add memleak_alloc callback from alloc_large_system_hash Catalin Marinas
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds the callbacks to memleak_(alloc|free) functions from
vmalloc/vfree.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 mm/vmalloc.c |   29 ++++++++++++++++++++++++++---
 1 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ba6b0f5..c0340ac 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -23,6 +23,7 @@
 #include <linux/rbtree.h>
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
+#include <linux/memleak.h>
 
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
@@ -1173,6 +1174,9 @@ static void __vunmap(const void *addr, int deallocate_pages)
 void vfree(const void *addr)
 {
 	BUG_ON(in_interrupt());
+
+	memleak_free(addr);
+
 	__vunmap(addr, 1);
 }
 EXPORT_SYMBOL(vfree);
@@ -1282,8 +1286,17 @@ fail:
 
 void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot)
 {
-	return __vmalloc_area_node(area, gfp_mask, prot, -1,
-					__builtin_return_address(0));
+	void *addr = __vmalloc_area_node(area, gfp_mask, prot, -1,
+					 __builtin_return_address(0));
+
+	/*
+	 * This needs ref_count = 2 since vm_struct also contains a
+	 * pointer to this address. The guard page is also subtracted
+	 * from the size.
+	 */
+	memleak_alloc(addr, area->size - PAGE_SIZE, 2);
+
+	return addr;
 }
 
 /**
@@ -1302,6 +1315,8 @@ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot,
 						int node, void *caller)
 {
 	struct vm_struct *area;
+	void *addr;
+	unsigned long real_size = size;
 
 	size = PAGE_ALIGN(size);
 	if (!size || (size >> PAGE_SHIFT) > num_physpages)
@@ -1313,7 +1328,15 @@ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot,
 	if (!area)
 		return NULL;
 
-	return __vmalloc_area_node(area, gfp_mask, prot, node, caller);
+	addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);
+
+	/*
+	 * This needs ref_count = 2 since the vm_struct also contains
+	 * a pointer to this address.
+	 */
+	memleak_alloc(addr, real_size, 2);
+
+	return addr;
 }
 
 void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/15] kmemleak: Add memleak_alloc callback from alloc_large_system_hash
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (5 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 06/15] kmemleak: Add the vmalloc " Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:43 ` [PATCH 08/15] kmemleak: Add modules support Catalin Marinas
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

The alloc_large_system_hash function is called from various places in
the kernel and it contains pointers to other allocated structures. It
therefore needs to be traced by kmemleak.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 mm/page_alloc.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8ac014..90e7dbd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/page-isolation.h>
 #include <linux/page_cgroup.h>
 #include <linux/debugobjects.h>
+#include <linux/memleak.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -4570,6 +4571,8 @@ void *__init alloc_large_system_hash(const char *tablename,
 	if (_hash_mask)
 		*_hash_mask = (1 << log2qty) - 1;
 
+	memleak_alloc(table, size, 1);
+
 	return table;
 }
 


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/15] kmemleak: Add modules support
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (6 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 07/15] kmemleak: Add memleak_alloc callback from alloc_large_system_hash Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:43 ` [PATCH 09/15] x86: Provide _sdata in the vmlinux_*.lds.S files Catalin Marinas
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch handles the kmemleak operations needed for modules loading so
that memory allocations from inside a module are properly tracked.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/module.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 1f4cc00..84a21a8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -51,6 +51,7 @@
 #include <asm/sections.h>
 #include <linux/tracepoint.h>
 #include <linux/ftrace.h>
+#include <linux/memleak.h>
 
 #if 0
 #define DEBUGP printk
@@ -409,6 +410,7 @@ static void *percpu_modalloc(unsigned long size, unsigned long align,
 	unsigned long extra;
 	unsigned int i;
 	void *ptr;
+	int cpu;
 
 	if (align > PAGE_SIZE) {
 		printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",
@@ -438,6 +440,10 @@ static void *percpu_modalloc(unsigned long size, unsigned long align,
 			if (!split_block(i, size))
 				return NULL;
 
+		/* add the per-cpu scanning areas */
+		for_each_possible_cpu(cpu)
+			memleak_alloc(ptr + per_cpu_offset(cpu), size, 0);
+
 		/* Mark allocated */
 		pcpu_size[i] = -pcpu_size[i];
 		return ptr;
@@ -452,6 +458,7 @@ static void percpu_modfree(void *freeme)
 {
 	unsigned int i;
 	void *ptr = __per_cpu_start + block_size(pcpu_size[0]);
+	int cpu;
 
 	/* First entry is core kernel percpu data. */
 	for (i = 1; i < pcpu_num_used; ptr += block_size(pcpu_size[i]), i++) {
@@ -463,6 +470,10 @@ static void percpu_modfree(void *freeme)
 	BUG();
 
  free:
+	/* remove the per-cpu scanning areas */
+	for_each_possible_cpu(cpu)
+		memleak_free(freeme + per_cpu_offset(cpu));
+
 	/* Merge with previous? */
 	if (pcpu_size[i-1] >= 0) {
 		pcpu_size[i-1] += pcpu_size[i];
@@ -1833,6 +1844,36 @@ static void *module_alloc_update_bounds(unsigned long size)
 	return ret;
 }
 
+#ifdef CONFIG_DEBUG_MEMLEAK
+static void memleak_load_module(struct module *mod, Elf_Ehdr *hdr,
+				Elf_Shdr *sechdrs, char *secstrings)
+{
+	unsigned int i;
+
+	/* only scan the sections containing data */
+	memleak_scan_area(mod->module_core,
+			  (unsigned long)mod - (unsigned long)mod->module_core,
+			  sizeof(struct module));
+
+	for (i = 1; i < hdr->e_shnum; i++) {
+		if (!(sechdrs[i].sh_flags & SHF_ALLOC))
+			continue;
+		if (strncmp(secstrings + sechdrs[i].sh_name, ".data", 5) != 0
+		    && strncmp(secstrings + sechdrs[i].sh_name, ".bss", 4) != 0)
+			continue;
+
+		memleak_scan_area(mod->module_core,
+				  sechdrs[i].sh_addr - (unsigned long)mod->module_core,
+				  sechdrs[i].sh_size);
+	}
+}
+#else
+static inline void memleak_load_module(struct module *mod, Elf_Ehdr *hdr,
+				       Elf_Shdr *sechdrs, char *secstrings)
+{
+}
+#endif
+
 /* Allocate and load the module: note that size of section 0 is always
    zero, and we rely on this for optional sections. */
 static noinline struct module *load_module(void __user *umod,
@@ -2011,6 +2052,12 @@ static noinline struct module *load_module(void __user *umod,
 
 	/* Do the allocs. */
 	ptr = module_alloc_update_bounds(mod->core_size);
+	/*
+	 * The pointer to this block is stored in the module structure
+	 * which is inside the block. Just mark it as not being a
+	 * leak.
+	 */
+	memleak_not_leak(ptr);
 	if (!ptr) {
 		err = -ENOMEM;
 		goto free_percpu;
@@ -2019,6 +2066,13 @@ static noinline struct module *load_module(void __user *umod,
 	mod->module_core = ptr;
 
 	ptr = module_alloc_update_bounds(mod->init_size);
+	/*
+	 * The pointer to this block is stored in the module structure
+	 * which is inside the block. This block doesn't need to be
+	 * scanned as it contains data and code that will be freed
+	 * after the module is initialized.
+	 */
+	memleak_ignore(ptr);
 	if (!ptr && mod->init_size) {
 		err = -ENOMEM;
 		goto free_core;
@@ -2049,6 +2103,7 @@ static noinline struct module *load_module(void __user *umod,
 	}
 	/* Module has been moved. */
 	mod = (void *)sechdrs[modindex].sh_addr;
+	memleak_load_module(mod, hdr, sechdrs, secstrings);
 
 	/* Now we've moved module, initialize linked lists, etc. */
 	module_unload_init(mod);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/15] x86: Provide _sdata in the vmlinux_*.lds.S files
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (7 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 08/15] kmemleak: Add modules support Catalin Marinas
@ 2008-11-29 10:43 ` Catalin Marinas
  2008-11-29 10:44 ` [PATCH 10/15] arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file Catalin Marinas
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

_sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/vmlinux_32.lds.S |    1 +
 arch/x86/kernel/vmlinux_64.lds.S |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index a9b8560..b5d2b49 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -62,6 +62,7 @@ SECTIONS
 
   /* writeable */
   . = ALIGN(PAGE_SIZE);
+  _sdata = .;			/* Start of data section */
   .data : AT(ADDR(.data) - LOAD_OFFSET) {	/* Data */
 	DATA_DATA
 	CONSTRUCTORS
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index 46e0544..8ad376c 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -52,6 +52,7 @@ SECTIONS
   RODATA
 
   . = ALIGN(PAGE_SIZE);		/* Align data segment to page size boundary */
+  _sdata = .;			/* Start of data section */
 				/* Data */
   .data : AT(ADDR(.data) - LOAD_OFFSET) {
 	DATA_DATA


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/15] arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (8 preceding siblings ...)
  2008-11-29 10:43 ` [PATCH 09/15] x86: Provide _sdata in the vmlinux_*.lds.S files Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  2008-11-29 10:44 ` [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives Catalin Marinas
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Russell King

_sdata and __bss_stop are common symbols defined by many architectures
and made available to the kernel via asm-generic/sections.h. Kmemleak
uses these symbols when scanning the data sections.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
---
 arch/arm/kernel/vmlinux.lds.S |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 4898bdc..3cf1d44 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -120,6 +120,7 @@ SECTIONS
 
 	.data : AT(__data_loc) {
 		__data_start = .;	/* address in memory */
+		_sdata = .;
 
 		/*
 		 * first, the init task union, aligned
@@ -171,6 +172,7 @@ SECTIONS
 		__bss_start = .;	/* BSS				*/
 		*(.bss)
 		*(COMMON)
+		__bss_stop = .;
 		_end = .;
 	}
 					/* Stabs debugging sections.	*/


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (9 preceding siblings ...)
  2008-11-29 10:44 ` [PATCH 10/15] arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  2008-11-29 11:48   ` Pekka Enberg
  2008-11-29 10:44 ` [PATCH 12/15] kmemleak: Enable the building of the memory leak detector Catalin Marinas
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

There are allocations for which the main pointer cannot be found but
they are not memory leaks. This patch fixes some of them. For more
information on false positives, see Documentation/kmemleak.txt.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 drivers/char/vt.c      |    7 +++++++
 include/linux/percpu.h |    5 +++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/char/vt.c b/drivers/char/vt.c
index a5af607..bb024f8 100644
--- a/drivers/char/vt.c
+++ b/drivers/char/vt.c
@@ -104,6 +104,7 @@
 #include <linux/io.h>
 #include <asm/system.h>
 #include <linux/uaccess.h>
+#include <linux/memleak.h>
 
 #define MAX_NR_CON_DRIVER 16
 
@@ -2882,6 +2883,12 @@ static int __init con_init(void)
 	 */
 	for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
 		vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct vc_data));
+		/*
+		 * Kmemleak does not track the memory allocated via
+		 * alloc_bootmem() but this block contains pointers to
+		 * other blocks allocated via kmalloc.
+		 */
+		memleak_alloc(vc, sizeof(struct vc_data), 1);
 		INIT_WORK(&vc_cons[currcons].SAK_work, vc_SAK);
 		visual_init(vc, currcons, 1);
 		vc->vc_screenbuf = (unsigned short *)alloc_bootmem(vc->vc_screenbuf_size);
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 9f2a375..4d1ce18 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -69,7 +69,12 @@ struct percpu_data {
 	void *ptrs[1];
 };
 
+/* pointer disguising messes up the kmemleak objects tracking */
+#ifndef CONFIG_DEBUG_MEMLEAK
 #define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
+#else
+#define __percpu_disguise(pdata) (struct percpu_data *)(pdata)
+#endif
 /* 
  * Use this to get to a cpu's version of the per-cpu object dynamically
  * allocated. Non-atomic access to the current CPU's version should


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/15] kmemleak: Enable the building of the memory leak detector
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (10 preceding siblings ...)
  2008-11-29 10:44 ` [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  2008-11-29 10:44 ` [PATCH 13/15] kmemleak: Keep the __init functions after initialization Catalin Marinas
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds the Kconfig.debug and Makefile entries needed for
building kmemleak into the kernel.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 lib/Kconfig.debug |   23 +++++++++++++++++++++++
 mm/Makefile       |    1 +
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b0f239e..1e59827 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -290,6 +290,29 @@ config SLUB_STATS
 	  out which slabs are relevant to a particular load.
 	  Try running: slabinfo -DA
 
+config DEBUG_MEMLEAK
+	bool "Kernel memory leak detector"
+	default n
+	depends on EXPERIMENTAL
+	select DEBUG_SLAB if SLAB
+	select SLUB_DEBUG if SLUB
+	select DEBUG_FS
+	select STACKTRACE
+	select FRAME_POINTER
+	select KALLSYMS
+	help
+	  Say Y here if you want to enable the memory leak
+	  detector. The memory allocation/freeing is traced in a way
+	  similar to the Boehm's conservative garbage collector, the
+	  difference being that the orphan objects are not freed but
+	  only shown in /sys/kernel/debug/memleak. Enabling this
+	  feature will introduce an overhead to memory
+	  allocations. See Documentation/kmemleak.txt for more
+	  details.
+
+	  In order to access the memleak file, debugfs needs to be
+	  mounted (usually at /sys/kernel/debug).
+
 config DEBUG_PREEMPT
 	bool "Debug preemptible kernel"
 	depends on DEBUG_KERNEL && PREEMPT && (TRACE_IRQFLAGS_SUPPORT || PPC64)
diff --git a/mm/Makefile b/mm/Makefile
index c06b45a..3e43536 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/15] kmemleak: Keep the __init functions after initialization
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (11 preceding siblings ...)
  2008-11-29 10:44 ` [PATCH 12/15] kmemleak: Enable the building of the memory leak detector Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  2008-11-29 10:44 ` [PATCH 14/15] kmemleak: Simple testing module for kmemleak Catalin Marinas
  2008-11-29 10:44 ` [PATCH 15/15] kmemleak: Add the corresponding MAINTAINERS entry Catalin Marinas
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds the CONFIG_DEBUG_KEEP_INIT option which preserves the
.init.* sections after initialization. Memory leaks happening during
this phase can be more easily tracked.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/init.h |    6 ++++++
 lib/Kconfig.debug    |   12 ++++++++++++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index 68cb026..41321ad 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -40,9 +40,15 @@
 
 /* These are for everybody (although not all archs will actually
    discard it in modules) */
+#ifdef CONFIG_DEBUG_KEEP_INIT
+#define __init
+#define __initdata
+#define __initconst
+#else
 #define __init		__section(.init.text) __cold notrace
 #define __initdata	__section(.init.data)
 #define __initconst	__section(.init.rodata)
+#endif
 #define __exitdata	__section(.exit.data)
 #define __exit_call	__used __section(.exitcall.exit)
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e59827..72cde77 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -313,6 +313,18 @@ config DEBUG_MEMLEAK
 	  In order to access the memleak file, debugfs needs to be
 	  mounted (usually at /sys/kernel/debug).
 
+config DEBUG_KEEP_INIT
+	bool "Do not free the __init code/data"
+	default n
+	depends on DEBUG_MEMLEAK
+	help
+	  This option moves the __init code/data out of the
+	  .init.text/.init.data sections. It is useful for identifying
+	  memory leaks happening during the kernel or modules
+	  initialization.
+
+	  If unsure, say N.
+
 config DEBUG_PREEMPT
 	bool "Debug preemptible kernel"
 	depends on DEBUG_KERNEL && PREEMPT && (TRACE_IRQFLAGS_SUPPORT || PPC64)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/15] kmemleak: Simple testing module for kmemleak
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (12 preceding siblings ...)
  2008-11-29 10:44 ` [PATCH 13/15] kmemleak: Keep the __init functions after initialization Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  2008-11-29 10:44 ` [PATCH 15/15] kmemleak: Add the corresponding MAINTAINERS entry Catalin Marinas
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

This patch adds a loadable module that deliberately leaks memory. It
is used for testing various memory leaking scenarios.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 lib/Kconfig.debug |   11 +++++
 mm/Makefile       |    1 
 mm/memleak-test.c |  109 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 0 deletions(-)
 create mode 100644 mm/memleak-test.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 72cde77..205c1da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -313,6 +313,17 @@ config DEBUG_MEMLEAK
 	  In order to access the memleak file, debugfs needs to be
 	  mounted (usually at /sys/kernel/debug).
 
+config DEBUG_MEMLEAK_TEST
+	tristate "Test the kernel memory leak detector"
+	default n
+	depends on DEBUG_MEMLEAK
+	help
+	  Say Y or M here to build a test for the kernel memory leak
+	  detector. This option enables a module that explicitly leaks
+	  memory.
+
+	  If unsure, say N.
+
 config DEBUG_KEEP_INIT
 	bool "Do not free the __init code/data"
 	default n
diff --git a/mm/Makefile b/mm/Makefile
index 3e43536..deb5935 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
 obj-$(CONFIG_DEBUG_MEMLEAK) += memleak.o
+obj-$(CONFIG_DEBUG_MEMLEAK_TEST) += memleak-test.o
diff --git a/mm/memleak-test.c b/mm/memleak-test.c
new file mode 100644
index 0000000..c7315af
--- /dev/null
+++ b/mm/memleak-test.c
@@ -0,0 +1,109 @@
+/*
+ * mm/memleak-test.c
+ *
+ * Copyright (C) 2008 ARM Limited
+ * Written by Catalin Marinas <catalin.marinas@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/list.h>
+#include <linux/percpu.h>
+
+#include <linux/memleak.h>
+
+struct test_node {
+	long header[25];
+	struct list_head list;
+	long footer[25];
+};
+
+static LIST_HEAD(test_list);
+static DEFINE_PER_CPU(void *, test_pointer);
+
+/*
+ * Some very simple testing. This function needs to be extended for
+ * proper testing.
+ */
+static int __init memleak_test_init(void)
+{
+	struct test_node *elem;
+	int i;
+
+	printk(KERN_INFO "Kmemleak testing\n");
+
+	/* make some orphan objects */
+	pr_info("kmemleak: kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(32) = %p\n", kmalloc(32, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(1024) = %p\n", kmalloc(1024, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(2048) = %p\n", kmalloc(2048, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL));
+	pr_info("kmemleak: kmalloc(4096) = %p\n", kmalloc(4096, GFP_KERNEL));
+#ifndef CONFIG_MODULES
+	pr_info("kmemleak: kmem_cache_alloc(files_cachep) = %p\n",
+		kmem_cache_alloc(files_cachep, GFP_KERNEL));
+	pr_info("kmemleak: kmem_cache_alloc(files_cachep) = %p\n",
+		kmem_cache_alloc(files_cachep, GFP_KERNEL));
+#endif
+	pr_info("kmemleak: vmalloc(64) = %p\n", vmalloc(64));
+	pr_info("kmemleak: vmalloc(64) = %p\n", vmalloc(64));
+	pr_info("kmemleak: vmalloc(64) = %p\n", vmalloc(64));
+	pr_info("kmemleak: vmalloc(64) = %p\n", vmalloc(64));
+	pr_info("kmemleak: vmalloc(64) = %p\n", vmalloc(64));
+
+	/*
+	 * Add elements to a list. They should only appear as orphan
+	 * after the module is removed.
+	 */
+	for (i = 0; i < 10; i++) {
+		elem = kmalloc(sizeof(*elem), GFP_KERNEL);
+		pr_info("kmemleak: kmalloc(sizeof(*elem)) = %p\n", elem);
+		if (!elem)
+			return -ENOMEM;
+		memset(elem, 0, sizeof(*elem));
+		INIT_LIST_HEAD(&elem->list);
+
+		list_add_tail(&elem->list, &test_list);
+	}
+
+	for_each_possible_cpu(i) {
+		per_cpu(test_pointer, i) = kmalloc(129, GFP_KERNEL);
+		pr_info("kmemleak: kmalloc(129) = %p\n", per_cpu(test_pointer, i));
+	}
+
+	return 0;
+}
+module_init(memleak_test_init);
+
+static void __exit memleak_test_exit(void)
+{
+	struct test_node *elem, *tmp;
+
+	/*
+	 * Remove the list elements without actually freeing the
+	 * memory.
+	 */
+	list_for_each_entry_safe(elem, tmp, &test_list, list)
+		list_del(&elem->list);
+}
+module_exit(memleak_test_exit);
+
+MODULE_LICENSE("GPL");


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 15/15] kmemleak: Add the corresponding MAINTAINERS entry
  2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
                   ` (13 preceding siblings ...)
  2008-11-29 10:44 ` [PATCH 14/15] kmemleak: Simple testing module for kmemleak Catalin Marinas
@ 2008-11-29 10:44 ` Catalin Marinas
  14 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-29 10:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 MAINTAINERS |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 627e4c8..06c2a3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2502,6 +2502,12 @@ L:	kernel-janitors@vger.kernel.org
 W:	http://www.kerneljanitors.org/
 S:	Maintained
 
+KERNEL MEMORY LEAK DETECTOR
+P:	Catalin Marinas
+M:	catalin.marinas@arm.com
+L:	linux-kernel@vger.kernel.org
+S:	Maintained
+
 KERNEL NFSD, SUNRPC, AND LOCKD SERVERS
 P:	J. Bruce Fields
 M:	bfields@fieldses.org


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-11-29 10:43 ` [PATCH 05/15] kmemleak: Add the slub " Catalin Marinas
@ 2008-11-29 11:46   ` Pekka Enberg
  0 siblings, 0 replies; 35+ messages in thread
From: Pekka Enberg @ 2008-11-29 11:46 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Ingo Molnar, Christoph Lameter

Hi Catalin,

Please use the penberg@cs.helsinki.fi email address when cc'ing me.

On Sat, Nov 29, 2008 at 12:43 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> This patch adds the callbacks to memleak_(alloc|free) functions from the
> slub allocator.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Pekka Enberg <penberg@gmail.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>

Looks good to me.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

> ---
>  mm/slub.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7ad489a..b683571 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -18,6 +18,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
> +#include <linux/memleak.h>
>  #include <linux/mempolicy.h>
>  #include <linux/ctype.h>
>  #include <linux/debugobjects.h>
> @@ -140,7 +141,7 @@
>  * Set of flags that will prevent slab merging
>  */
>  #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> -               SLAB_TRACE | SLAB_DESTROY_BY_RCU)
> +               SLAB_TRACE | SLAB_DESTROY_BY_RCU | SLAB_NOLEAKTRACE)
>
>  #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \
>                SLAB_CACHE_DMA)
> @@ -1608,6 +1609,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>        if (unlikely((gfpflags & __GFP_ZERO) && object))
>                memset(object, 0, objsize);
>
> +       memleak_alloc_recursive(object, objsize, 1, s->flags);
>        return object;
>  }
>
> @@ -1710,6 +1712,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
>        struct kmem_cache_cpu *c;
>        unsigned long flags;
>
> +       memleak_free_recursive(x, s->flags);
>        local_irq_save(flags);
>        c = get_cpu_slab(s, smp_processor_id());
>        debug_check_no_locks_freed(object, c->objsize);
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives
  2008-11-29 10:44 ` [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives Catalin Marinas
@ 2008-11-29 11:48   ` Pekka Enberg
  2008-11-30 19:02     ` Catalin Marinas
  0 siblings, 1 reply; 35+ messages in thread
From: Pekka Enberg @ 2008-11-29 11:48 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Ingo Molnar

Hi Catalin,


On Sat, Nov 29, 2008 at 12:44 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> There are allocations for which the main pointer cannot be found but
> they are not memory leaks. This patch fixes some of them. For more
> information on false positives, see Documentation/kmemleak.txt.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  drivers/char/vt.c      |    7 +++++++
>  include/linux/percpu.h |    5 +++++
>  2 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/char/vt.c b/drivers/char/vt.c
> index a5af607..bb024f8 100644
> --- a/drivers/char/vt.c
> +++ b/drivers/char/vt.c
> @@ -104,6 +104,7 @@
>  #include <linux/io.h>
>  #include <asm/system.h>
>  #include <linux/uaccess.h>
> +#include <linux/memleak.h>
>
>  #define MAX_NR_CON_DRIVER 16
>
> @@ -2882,6 +2883,12 @@ static int __init con_init(void)
>         */
>        for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
>                vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct vc_data));
> +               /*
> +                * Kmemleak does not track the memory allocated via
> +                * alloc_bootmem() but this block contains pointers to
> +                * other blocks allocated via kmalloc.
> +                */
> +               memleak_alloc(vc, sizeof(struct vc_data), 1);

Can we add some hooks to alloc_bootmem() to handle this? It's somewhat
unfortunate that we need to annotate driver code.

>                INIT_WORK(&vc_cons[currcons].SAK_work, vc_SAK);
>                visual_init(vc, currcons, 1);
>                vc->vc_screenbuf = (unsigned short *)alloc_bootmem(vc->vc_screenbuf_size);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/15] kmemleak: Add the slob memory allocation/freeing hooks
  2008-11-29 10:43 ` [PATCH 04/15] kmemleak: Add the slob " Catalin Marinas
@ 2008-11-30 17:30   ` Matt Mackall
  0 siblings, 0 replies; 35+ messages in thread
From: Matt Mackall @ 2008-11-30 17:30 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Pekka Enberg, Ingo Molnar

On Sat, 2008-11-29 at 10:43 +0000, Catalin Marinas wrote:
> This patch adds the callbacks to memleak_(alloc|free) functions from the
> slob allocator.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Pekka Enberg <penberg@gmail.com>
> Cc: Matt Mackall <mpm@selenic.com>

Acked-by: Matt Mackall <mpm@selenic.com>

-- 
Mathematics is the supreme nostalgia of our time.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives
  2008-11-29 11:48   ` Pekka Enberg
@ 2008-11-30 19:02     ` Catalin Marinas
  0 siblings, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-11-30 19:02 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, Ingo Molnar

Hi Pekka,

On Sat, 2008-11-29 at 13:48 +0200, Pekka Enberg wrote:
> On Sat, Nov 29, 2008 at 12:44 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > There are allocations for which the main pointer cannot be found but
> > they are not memory leaks. This patch fixes some of them. For more
> > information on false positives, see Documentation/kmemleak.txt.
[...]
> > --- a/drivers/char/vt.c
> > +++ b/drivers/char/vt.c
[...]
> > @@ -2882,6 +2883,12 @@ static int __init con_init(void)
> >         */
> >        for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
> >                vc_cons[currcons].d = vc = alloc_bootmem(sizeof(struct vc_data));
> > +               /*
> > +                * Kmemleak does not track the memory allocated via
> > +                * alloc_bootmem() but this block contains pointers to
> > +                * other blocks allocated via kmalloc.
> > +                */
> > +               memleak_alloc(vc, sizeof(struct vc_data), 1);
> 
> Can we add some hooks to alloc_bootmem() to handle this? It's somewhat
> unfortunate that we need to annotate driver code.

I did a quick grep for the alloc_bootmem uses in the kernel and I don't
think these would increase the chance of getting false negatives. I'll
give it a try.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
@ 2008-12-01  5:39   ` Yasunori Goto
  2008-12-01  6:49     ` Pekka Enberg
  2008-12-01  7:15   ` Pekka Enberg
  2008-12-02 21:28   ` Andrew Morton
  2 siblings, 1 reply; 35+ messages in thread
From: Yasunori Goto @ 2008-12-01  5:39 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Ingo Molnar

Hello.

I have a question. (may be silly)

> +/*
> + * Insert a pointer into the pointer hash table.
> + */
> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +	struct prio_tree_node *node;
> +	struct stack_trace trace;
> +
> +	object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
> +	if (!object)
> +		panic("kmemleak: cannot allocate a memleak_object structure\n");

IIRC, GFP_ATOMIC allocation sometimes fails. (ex. when page cache occupies all
area). It seems to be easy to panic. Is it intended? 


Thanks.
-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-01  5:39   ` Yasunori Goto
@ 2008-12-01  6:49     ` Pekka Enberg
  2008-12-01  8:11       ` Yasunori Goto
  2008-12-01 12:29       ` Catalin Marinas
  0 siblings, 2 replies; 35+ messages in thread
From: Pekka Enberg @ 2008-12-01  6:49 UTC (permalink / raw)
  To: Yasunori Goto; +Cc: Catalin Marinas, linux-kernel, Ingo Molnar

Hi!

On Mon, Dec 1, 2008 at 7:39 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
>> +/*
>> + * Insert a pointer into the pointer hash table.
>> + */
>> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
>> +{
>> +     unsigned long flags;
>> +     struct memleak_object *object;
>> +     struct prio_tree_node *node;
>> +     struct stack_trace trace;
>> +
>> +     object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
>> +     if (!object)
>> +             panic("kmemleak: cannot allocate a memleak_object structure\n");
>
> IIRC, GFP_ATOMIC allocation sometimes fails. (ex. when page cache occupies all
> area). It seems to be easy to panic. Is it intended?

Yup, GFP_ATOMIC can fail as can any memory allocation on out-of-memory
conditions unless you specify GFP_NOFAIL which will either succeed or
lock up the box. I think you can just WARN_ON() here? However, it's
probably safer to pass gfp flags from the callers here; otherwise we
end up doing tons of GFP_ATOMIC allocations which is not healthy in
general.

Also, I see some other BUG_ON() calls in the code which probably
should be converted to WARN_ON() as well.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
  2008-12-01  5:39   ` Yasunori Goto
@ 2008-12-01  7:15   ` Pekka Enberg
  2008-12-02 21:28   ` Andrew Morton
  2 siblings, 0 replies; 35+ messages in thread
From: Pekka Enberg @ 2008-12-01  7:15 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Ingo Molnar

Hi Catalin,

On Sat, Nov 29, 2008 at 12:43 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> +/*
> + * Insert a pointer into the pointer hash table.
> + */
> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
> +{

[...]

> +       if (ptr < min_addr)
> +               min_addr = ptr;
> +       if (ptr + size > max_addr)
> +               max_addr = ptr + size;
> +       /*
> +        * Update the boundaries before inserting the object in the
> +        * prio search tree.
> +        */
> +       smp_mb();

I'm not sure I understand the purpose of this memory barrier. As soon
as some other CPU acquires object_tree_lock, updates to the boundaries
will be visible due to the implicit memory barriers in locking
functions (see Documentation/memory-barrier.txt for details).

However, I'm wondering why this isn't a smp_wmb() and..

> +/*
> + * Scan a block of memory (exclusive range) for pointers and move
> + * those found to the gray list.
> + */
> +static void scan_block(void *_start, void *_end, struct memleak_object *scanned)
> +{
> +       unsigned long *ptr;
> +       unsigned long *start = PTR_ALIGN(_start, BYTES_PER_WORD);
> +       unsigned long *end = _end - (BYTES_PER_WORD - 1);
> +
> +       for (ptr = start; ptr < end; ptr++) {

...why don't we have the pairing smp_rmb() here before we read
min_addr and max_addr?

> +
> +               /*
> +                * The boundaries check doesn't need to be precise
> +                * (hence no locking) since orphan objects need to
> +                * pass a scanning threshold before being reported.
> +                */
> +               if (pointer < min_addr || pointer >= max_addr)
> +                       continue;

                Pekka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-01  6:49     ` Pekka Enberg
@ 2008-12-01  8:11       ` Yasunori Goto
  2008-12-01 12:29       ` Catalin Marinas
  1 sibling, 0 replies; 35+ messages in thread
From: Yasunori Goto @ 2008-12-01  8:11 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Ingo Molnar, Pekka Enberg

> Hi!
> 
> On Mon, Dec 1, 2008 at 7:39 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> >> +/*
> >> + * Insert a pointer into the pointer hash table.
> >> + */
> >> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
> >> +{
> >> +     unsigned long flags;
> >> +     struct memleak_object *object;
> >> +     struct prio_tree_node *node;
> >> +     struct stack_trace trace;
> >> +
> >> +     object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
> >> +     if (!object)
> >> +             panic("kmemleak: cannot allocate a memleak_object structure\n");
> >
> > IIRC, GFP_ATOMIC allocation sometimes fails. (ex. when page cache occupies all
> > area). It seems to be easy to panic. Is it intended?
> 
> Yup, GFP_ATOMIC can fail as can any memory allocation on out-of-memory
> conditions unless you specify GFP_NOFAIL which will either succeed or
> lock up the box. I think you can just WARN_ON() here? However, it's
> probably safer to pass gfp flags from the callers here; otherwise we
> end up doing tons of GFP_ATOMIC allocations which is not healthy in
> general.

I agree. It is reasonable to pass gfp flag from the caller.

Thanks.

> 
> Also, I see some other BUG_ON() calls in the code which probably
> should be converted to WARN_ON() as well.

-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-01  6:49     ` Pekka Enberg
  2008-12-01  8:11       ` Yasunori Goto
@ 2008-12-01 12:29       ` Catalin Marinas
  1 sibling, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-12-01 12:29 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Yasunori Goto, linux-kernel, Ingo Molnar

On Mon, 2008-12-01 at 08:49 +0200, Pekka Enberg wrote:
> On Mon, Dec 1, 2008 at 7:39 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> >> +/*
> >> + * Insert a pointer into the pointer hash table.
> >> + */
> >> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
> >> +{
> >> +     unsigned long flags;
> >> +     struct memleak_object *object;
> >> +     struct prio_tree_node *node;
> >> +     struct stack_trace trace;
> >> +
> >> +     object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
> >> +     if (!object)
> >> +             panic("kmemleak: cannot allocate a memleak_object structure\n");
> >
> > IIRC, GFP_ATOMIC allocation sometimes fails. (ex. when page cache occupies all
> > area). It seems to be easy to panic. Is it intended?
> 
> Yup, GFP_ATOMIC can fail as can any memory allocation on out-of-memory
> conditions unless you specify GFP_NOFAIL which will either succeed or
> lock up the box. I think you can just WARN_ON() here? However, it's
> probably safer to pass gfp flags from the callers here; otherwise we
> end up doing tons of GFP_ATOMIC allocations which is not healthy in
> general.

Yes, good idea, GFP_ATOMIC isn't always needed.

> Also, I see some other BUG_ON() calls in the code which probably
> should be converted to WARN_ON() as well.

This was also raised in the past by Ingo and it's on my to-do list to
fix. Basically, if an allocation for internal kmemleak structures fails,
kmemleak should print a warning and disable the tracing since further
leak reports are irrelevant. Once I add the run-time kmemleak
configuration/disabling, I'll change the BUG_ON and panic calls to
something less severe.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
  2008-12-01  5:39   ` Yasunori Goto
  2008-12-01  7:15   ` Pekka Enberg
@ 2008-12-02 21:28   ` Andrew Morton
  2008-12-04 18:50     ` Catalin Marinas
  2008-12-18  9:28     ` Catalin Marinas
  2 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2008-12-02 21:28 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, mingo

On Sat, 29 Nov 2008 10:43:12 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> This patch adds the base support for the kernel memory leak
> detector. It traces the memory allocation/freeing in a way similar to
> the Boehm's conservative garbage collector, the difference being that
> the unreferenced objects are not freed but only shown in
> /sys/kernel/debug/memleak. Enabling this feature introduces an
> overhead to memory allocations.
> 

Please feed all the diffs through checkpatch.  You might decide to
ignore some of the warnings, but that script does detect things which
you did not intend to add.

> ---
>  include/linux/memleak.h |   87 ++++
>  init/main.c             |    4 
>  mm/memleak.c            | 1019 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1109 insertions(+), 1 deletions(-)
>  create mode 100644 include/linux/memleak.h
>  create mode 100644 mm/memleak.c
> 
> diff --git a/include/linux/memleak.h b/include/linux/memleak.h
> new file mode 100644
> index 0000000..2756a05
> --- /dev/null
> +++ b/include/linux/memleak.h
> @@ -0,0 +1,87 @@
> +/*
> + * include/linux/memleak.h
> + *
> + * Copyright (C) 2008 ARM Limited
> + * Written by Catalin Marinas <catalin.marinas@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#ifndef __MEMLEAK_H
> +#define __MEMLEAK_H
> +
> +#ifdef CONFIG_DEBUG_MEMLEAK
> +
> +extern void memleak_init(void);
> +extern void memleak_alloc(const void *ptr, size_t size, int ref_count);
> +extern void memleak_free(const void *ptr);
> +extern void memleak_padding(const void *ptr, unsigned long offset, size_t size);
> +extern void memleak_not_leak(const void *ptr);
> +extern void memleak_ignore(const void *ptr);
> +extern void memleak_scan_area(const void *ptr, unsigned long offset, size_t length);
> +
> +static inline void memleak_alloc_recursive(const void *ptr, size_t size,
> +					   int ref_count, unsigned long flags)
> +{
> +	if (!(flags & SLAB_NOLEAKTRACE))
> +		memleak_alloc(ptr, size, ref_count);
> +}
> +
> +static inline void memleak_free_recursive(const void *ptr, unsigned long flags)
> +{
> +	if (!(flags & SLAB_NOLEAKTRACE))
> +		memleak_free(ptr);
> +}
> +
> +static inline void memleak_erase(void **ptr)
> +{
> +	*ptr = NULL;
> +}
> +
> +#else
> +
> +#define DECLARE_MEMLEAK_OFFSET(name, type, member)
> +
> +static inline void memleak_init(void)
> +{
> +}
> +static inline void memleak_alloc(const void *ptr, size_t size, int ref_count)
> +{
> +}
> +static inline void memleak_alloc_recursive(const void *ptr, size_t size,
> +					   int ref_count, unsigned long flags)
> +{
> +}
> +static inline void memleak_free(const void *ptr)
> +{
> +}
> +static inline void memleak_free_recursive(const void *ptr, unsigned long flags)
> +{
> +}
> +static inline void memleak_not_leak(const void *ptr)
> +{
> +}
> +static inline void memleak_ignore(const void *ptr)
> +{
> +}
> +static inline void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
> +{
> +}
> +static inline void memleak_erase(void **ptr)
> +{
> +}
> +
> +#endif	/* CONFIG_DEBUG_MEMLEAK */
> +
> +#endif	/* __MEMLEAK_H */
> diff --git a/init/main.c b/init/main.c
> index 7e117a2..81cbbb7 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -56,6 +56,7 @@
>  #include <linux/debug_locks.h>
>  #include <linux/debugobjects.h>
>  #include <linux/lockdep.h>
> +#include <linux/memleak.h>
>  #include <linux/pid_namespace.h>
>  #include <linux/device.h>
>  #include <linux/kthread.h>
> @@ -653,6 +654,8 @@ asmlinkage void __init start_kernel(void)
>  	enable_debug_pagealloc();
>  	cpu_hotplug_init();
>  	kmem_cache_init();
> +	prio_tree_init();
> +	memleak_init();
>  	debug_objects_mem_init();
>  	idr_init_cache();
>  	setup_per_cpu_pageset();
> @@ -662,7 +665,6 @@ asmlinkage void __init start_kernel(void)
>  	calibrate_delay();
>  	pidmap_init();
>  	pgtable_cache_init();
> -	prio_tree_init();

Yeah.

prio_tree_init() could be done at compile-time, even.

>  	anon_vma_init();
>  #ifdef CONFIG_X86
>  	if (efi_enabled)
> diff --git a/mm/memleak.c b/mm/memleak.c
> new file mode 100644
> index 0000000..1b09ca2
> --- /dev/null
> +++ b/mm/memleak.c
> @@ -0,0 +1,1019 @@
> +/*
> + * mm/memleak.c
> + *
> + * Copyright (C) 2008 ARM Limited
> + * Written by Catalin Marinas <catalin.marinas@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/sched.h>
> +#include <linux/jiffies.h>
> +#include <linux/delay.h>
> +#include <linux/module.h>
> +#include <linux/kthread.h>
> +#include <linux/prio_tree.h>
> +#include <linux/gfp.h>
> +#include <linux/kallsyms.h>
> +#include <linux/debugfs.h>
> +#include <linux/seq_file.h>
> +#include <linux/cpumask.h>
> +#include <linux/spinlock.h>
> +#include <linux/mutex.h>
> +#include <linux/rcupdate.h>
> +#include <linux/stacktrace.h>
> +#include <linux/cache.h>
> +#include <linux/percpu.h>
> +#include <linux/hardirq.h>
> +#include <linux/mmzone.h>
> +#include <linux/slab.h>
> +
> +#include <asm/sections.h>
> +#include <asm/processor.h>
> +#include <asm/thread_info.h>
> +#include <asm/atomic.h>
> +
> +#include <linux/memleak.h>
> +
> +/*
> + * Kmemleak configuration and common defines.
> + */
> +#define MAX_TRACE		16	/* stack trace length */
> +#define REPORTS_NR		100	/* maximum number of reported leaks */
> +#define MSECS_MIN_AGE		1000	/* minimum object age for leak reporting */ 
> +#undef SCAN_TASK_STACKS			/* scan the task kernel stacks */
> +#undef REPORT_ORPHAN_FREEING		/* notify when freeing orphan objects */
> +
> +#define BYTES_PER_WORD		sizeof(void *)

This isn't a good idea.

If we need a kernel-wide macro for this then please propose one.

If someone later adds one, they might call it BYTES_PER_WORD, in which
case we'll get duplication or build breakage.

BYTES_PER_POINTER might be a better name.

> +#define MSECS_SCAN_YIELD	10
> +#define SECS_FIRST_SCAN		60
> +#define SECS_SCAN_PERIOD	600
> +
> +/* scanning area inside a memory block */
> +struct memleak_scan_area {
> +	struct hlist_node node;
> +	unsigned long offset;
> +	size_t length;
> +};
> +
> +/* the main allocation tracking object */
> +struct memleak_object {
> +	spinlock_t lock;
> +	unsigned long flags;
> +	struct list_head object_list;
> +	struct list_head gray_list;
> +	struct prio_tree_node tree_node;
> +	struct rcu_head rcu;		/* used for object_list lockless traversal */
> +	atomic_t use_count;		/* internal usage count */
> +	unsigned long pointer;
> +	size_t size;
> +	int ref_count;			/* the minimum encounters of the pointer */
> +	int count;			/* the ecounters of the pointer */

Was that supposed to read "encounters"?

The reader of the above documentation will be wondering what an
"encounter" is.

`count', `refcount' and `use_count' is getting confusing...

> +	struct hlist_head area_list;	/* areas to be scanned (or empty for all) */
> +	unsigned long trace[MAX_TRACE];
> +	unsigned int trace_len;
> +	unsigned long jiffies;		/* creation timestamp */
> +	pid_t pid;			/* pid of the current task */

Two tasks in different pid namespaces can have the same pid.  What
effect will that have upon kmemleak?

> +	char comm[TASK_COMM_LEN];	/* executable name */
> +};
> +
> +/* The list of all allocated objects */
> +static LIST_HEAD(object_list);
> +static DEFINE_SPINLOCK(object_list_lock);
> +/* The list of the gray objects */
> +static LIST_HEAD(gray_list);

It would be nice to briefly define the term "gray" before using it.

> +/* prio search tree for object boundaries */
> +static struct prio_tree_root object_tree_root;
> +static DEFINE_RWLOCK(object_tree_lock);
> +
> +/* allocation caches */
> +static struct kmem_cache *object_cache;
> +static struct kmem_cache *scan_area_cache;
> +
> +static atomic_t memleak_enabled = ATOMIC_INIT(0);
> +
> +/* minimum and maximum address that may be valid pointers */
> +static unsigned long min_addr = ~0;

<writes a test program>

OK, I think you got lucky here.

"0" is a signed 32-bit integer, having the all-zeroes pattern.

"~0" is a signed 32-bit integer, having the all-ones bit pattern.

When that gets assigned to an unsigned long it gets sign-extended
first, so you indeed ended up with the 64-bit all-ones pattern.

All very tricky!  I like to use plain old "-1" for the all-ones pattern
and let the compiler do the work.  It always works.

> +static unsigned long max_addr;
> +
> +/* used for yielding the CPU to other tasks during scanning */
> +static unsigned long next_scan_yield;
> +static struct task_struct *scan_thread;
> +static DEFINE_MUTEX(scan_mutex);
> +static int reported_leaks;
> +static int scan_should_stop;
> +
> +static unsigned long jiffies_scan_yield;
> +static unsigned long jiffies_min_age;
> +
> +/*
> + * Early object allocation (before kmemleak is fully initialised).
> + */
> +#define EARLY_LOG_NR		200
> +
> +enum {
> +	MEMLEAK_ALLOC,
> +	MEMLEAK_FREE,
> +	MEMLEAK_NOT_LEAK,
> +	MEMLEAK_IGNORE,
> +	MEMLEAK_SCAN_AREA,
> +};

It looks to me like the above states are pretty important to
understanding the code.  Perhaps they each should be documented?

> +struct early_log {
> +	int op_type;			/* kmemleak operation type */
> +	const void *ptr;
> +	size_t size;
> +	int ref_count;
> +	unsigned long offset;
> +	size_t length;
> +};

Undocumented?  What's it for, and what do the fields do?

> +static struct early_log __initdata early_log[EARLY_LOG_NR];

You could in fact remove EARLY_LOG_NR.  Just use "200" here, and
ARRAY_SIZE() below.  Whatever.

> +static int __initdata crt_early_log;
> +
> +/* object flags */
> +#define OBJECT_ALLOCATED	(1 << 0)
> +#define OBJECT_REPORTED		(1 << 1)
> +
> +/*
> + * Object colors, encoded with count and ref_count:
> + *   - white - orphan object, i.e. not enough references to it (ref_count >= 1)
> + *   - gray  - referred at least once and therefore non-orphan (ref_count == 0)
> + *   - black - ignore; it doesn't contain references (text section) (ref_count == -1).
> + * Newly created objects don't have any color (object->count == -1) before
> + * the next memory scan when they become white.
> + */

ah, there we go.

Please do fit the block comments into 80-columns.

> +static inline int color_white(const struct memleak_object *object)
> +{
> +	return object->count != -1 && object->count < object->ref_count;
> +}
> +
> +static inline int color_gray(const struct memleak_object *object)
> +{
> +	return object->ref_count != -1 && object->count >= object->ref_count;
> +}
> +
> +static inline int color_black(const struct memleak_object *object)
> +{
> +	return object->ref_count == -1;
> +}
> +
> +static inline int unreferenced_object(struct memleak_object *object)
> +{
> +	if (color_white(object) &&
> +	    (object->flags & OBJECT_ALLOCATED) &&
> +	    time_is_before_eq_jiffies(object->jiffies + jiffies_min_age))
> +		return 1;
> +	else
> +		return 0;
> +}
> +
> +#define print_seq(seq, x...)			\
> +do {						\
> +	if (seq)				\
> +		seq_printf(seq, x);		\
> +	else					\
> +		pr_info(x);			\
> +} while (0)
> +
> +static void print_unreferenced(struct seq_file *seq,
> +			       struct memleak_object *object)
> +{
> +	char namebuf[KSYM_NAME_LEN + 1] = "";
> +	char *modname;
> +	unsigned long symsize;
> +	unsigned long offset = 0;

Initialised because gcc is stupid :(

> +	int i;
> +
> +	print_seq(seq, "unreferenced object 0x%08lx (size %zu):\n",
> +		  object->pointer, object->size);
> +	print_seq(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
> +		  object->comm, object->pid, object->jiffies);
> +	print_seq(seq, "  backtrace:\n");
> +
> +	for (i = 0; i < object->trace_len; i++) {
> +		unsigned long trace = object->trace[i];

Could move the definition of `offset' to here.

> +		kallsyms_lookup(trace, &symsize, &offset, &modname, namebuf);
> +		print_seq(seq, "    [<%08lx>] %s\n", trace, namebuf);

We can't use the new %p thing here?

> +	}
> +}
> +
> +static void dump_object_info(struct memleak_object *object)
> +{
> +	struct stack_trace trace;
> +
> +	trace.nr_entries = object->trace_len;
> +	trace.entries = object->trace;
> +
> +	pr_notice("kmemleak: object 0x%08lx (size %zu):\n",
> +		  object->tree_node.start, object->size);
> +	pr_notice("  comm \"%s\", pid %d, jiffies %lu\n",
> +		  object->comm, object->pid, object->jiffies);
> +	pr_notice("  ref_count = %d\n", object->ref_count);
> +	pr_notice("  count = %d\n", object->count);
> +	pr_notice("  backtrace:\n");
> +	print_stack_trace(&trace, 4);
> +}
> +
> +static struct memleak_object *lookup_object(unsigned long ptr, int alias)
> +{
> +	struct prio_tree_node *node;
> +	struct prio_tree_iter iter;
> +	struct memleak_object *object;
> +
> +	prio_tree_iter_init(&iter, &object_tree_root, ptr, ptr);
> +	node = prio_tree_next(&iter);
> +	if (node) {
> +		object = prio_tree_entry(node, struct memleak_object, tree_node);
> +		if (!alias && object->pointer != ptr) {
> +			pr_warning("kmemleak: found object by alias");
> +			object = NULL;
> +		}
> +	} else
> +		object = NULL;
> +
> +	return object;
> +}
> +
> +/*
> + * Return 1 if successful or 0 otherwise.
> + */
> +static inline int get_object(struct memleak_object *object)
> +{
> +	return atomic_inc_not_zero(&object->use_count);
> +}
> +
> +static void free_object_rcu(struct rcu_head *rcu)
> +{
> +	struct hlist_node *elem, *tmp;
> +	struct memleak_scan_area *area;
> +	struct memleak_object *object =
> +		container_of(rcu, struct memleak_object, rcu);
> +
> +	/* once use_count is 0, there is no code accessing the object */
> +	hlist_for_each_entry_safe(area, elem, tmp, &object->area_list, node) {
> +		hlist_del(elem);
> +		kmem_cache_free(scan_area_cache, area);
> +	}
> +	kmem_cache_free(object_cache, object);
> +}
> +
> +static void put_object(struct memleak_object *object)
> +{
> +	unsigned long flags;
> +
> +	if (!atomic_dec_and_test(&object->use_count))
> +		return;
> +
> +	/* should only get here after delete_object was called */
> +	BUG_ON(object->flags & OBJECT_ALLOCATED);
> +
> +	spin_lock_irqsave(&object_list_lock, flags);
> +	/* the last reference to this object */
> +	list_del_rcu(&object->object_list);
> +	call_rcu(&object->rcu, free_object_rcu);
> +	spin_unlock_irqrestore(&object_list_lock, flags);
> +}
> +
> +static struct memleak_object *find_and_get_object(unsigned long ptr, int alias)
> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +
> +	read_lock_irqsave(&object_tree_lock, flags);
> +	object = lookup_object(ptr, alias);
> +	if (object)
> +		get_object(object);
> +	read_unlock_irqrestore(&object_tree_lock, flags);
> +
> +	return object;
> +}
> +
> +/*
> + * Insert a pointer into the pointer hash table.
> + */
> +static inline void create_object(unsigned long ptr, size_t size, int ref_count)

I'd suggest removing the `inline', let the compiler work it out.

> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +	struct prio_tree_node *node;
> +	struct stack_trace trace;
> +
> +	object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
> +	if (!object)
> +		panic("kmemleak: cannot allocate a memleak_object structure\n");

That's a big problem, isn't it?  GFP_ATOMIC allocations are quite
unreliable, and we just nuked the machine?

> +	INIT_LIST_HEAD(&object->object_list);
> +	INIT_LIST_HEAD(&object->gray_list);
> +	INIT_HLIST_HEAD(&object->area_list);
> +	spin_lock_init(&object->lock);
> +	atomic_set(&object->use_count, 1);
> +	object->flags = OBJECT_ALLOCATED;
> +	object->pointer = ptr;
> +	object->size = size;
> +	object->ref_count = ref_count;
> +	object->count = -1;				/* black color initially */
> +	object->jiffies = jiffies;
> +	if (in_irq()) {
> +		object->pid = 0;
> +		strncpy(object->comm, "hardirq", TASK_COMM_LEN);
> +	} else if (in_softirq()) {
> +		object->pid = 0;
> +		strncpy(object->comm, "softirq", TASK_COMM_LEN);
> +	} else {
> +		object->pid = current->pid;
> +		strncpy(object->comm, current->comm, TASK_COMM_LEN);

Access to current->comm is a teeny but racy.  Use get_task_comm() here.

> +	}
> +
> +	trace.max_entries = MAX_TRACE;
> +	trace.nr_entries = 0;
> +	trace.entries = object->trace;
> +	trace.skip = 1;
> +	save_stack_trace(&trace);
> +
> +	object->trace_len = trace.nr_entries;
> +
> +	INIT_PRIO_TREE_NODE(&object->tree_node);
> +	object->tree_node.start = ptr;
> +	object->tree_node.last = ptr + size - 1;
> +
> +	if (ptr < min_addr)
> +		min_addr = ptr;
> +	if (ptr + size > max_addr)
> +		max_addr = ptr + size;

Use min() and max()?

> +	/*
> +	 * Update the boundaries before inserting the object in the
> +	 * prio search tree.
> +	 */
> +	smp_mb();

hm.  Is that needed?  I hope not, assuming that readers are taking
read_lock(object_tree_lock).

> +	write_lock_irqsave(&object_tree_lock, flags);
> +	node = prio_tree_insert(&object_tree_root, &object->tree_node);
> +	if (node != &object->tree_node) {
> +		unsigned long flags;
> +
> +		pr_warning("kmemleak: existing pointer\n");
> +		dump_stack();
> +
> +		object = lookup_object(ptr, 1);
> +		spin_lock_irqsave(&object->lock, flags);
> +		dump_object_info(object);
> +		spin_unlock_irqrestore(&object->lock, flags);
> +
> +		panic("kmemleak: cannot insert 0x%lx into the object search tree\n",
> +		      ptr);
> +	}
> +	write_unlock_irqrestore(&object_tree_lock, flags);
> +
> +	spin_lock_irqsave(&object_list_lock, flags);
> +	list_add_tail_rcu(&object->object_list, &object_list);
> +	spin_unlock_irqrestore(&object_list_lock, flags);
> +}
> +
> +/*
> + * Remove a pointer from the pointer hash table.
> + */
> +static inline void delete_object(unsigned long ptr)

uninline..

> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +
> +	write_lock_irqsave(&object_tree_lock, flags);
> +	object = lookup_object(ptr, 0);
> +	if (!object) {
> +		pr_warning("kmemleak: freeing unknown object at 0x%08lx\n", ptr);
> +		dump_stack();
> +		write_unlock_irqrestore(&object_tree_lock, flags);
> +		return;
> +	}
> +	prio_tree_remove(&object_tree_root, &object->tree_node);
> +	write_unlock_irqrestore(&object_tree_lock, flags);
> +
> +	BUG_ON(!(object->flags & OBJECT_ALLOCATED));
> +
> +	spin_lock_irqsave(&object->lock, flags);
> +	object->flags &= ~OBJECT_ALLOCATED;
> +#ifdef REPORT_ORPHAN_FREEING
> +	if (color_white(object)) {
> +		pr_warning("kmemleak: freeing orphan object 0x%08lx\n", ptr);
> +		dump_stack();
> +		dump_object_info(object);
> +	}
> +#endif
> +	object->pointer = 0;
> +	spin_unlock_irqrestore(&object->lock, flags);
> +
> +	put_object(object);

Seems strange to take the object's internal lock if we're about to
delete it - nobody else should be using it anyway?

(If that put_object() may not actually free the object then this
observation is wrong..)

> +}
> +
> +/*
> + * Make a object permanently gray (false positive).
> + */
> +static inline void make_gray_object(unsigned long ptr)

Please review all inlining decisions in this patchset.

> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +
> +	object = find_and_get_object(ptr, 0);
> +	if (!object) {
> +		dump_stack();
> +		panic("kmemleak: graying unknown object at 0x%08lx\n", ptr);
> +	}
> +
> +	spin_lock_irqsave(&object->lock, flags);
> +	object->ref_count = 0;
> +	spin_unlock_irqrestore(&object->lock, flags);
> +	put_object(object);
> +}
> +
> +/*
> + * Mark the object as black.
> + */
> +static inline void make_black_object(unsigned long ptr)
> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +
> +	object = find_and_get_object(ptr, 0);
> +	if (!object) {
> +		dump_stack();
> +		panic("kmemleak: blacking unknown object at 0x%08lx\n", ptr);
> +	}
> +
> +	spin_lock_irqsave(&object->lock, flags);
> +	object->ref_count = -1;
> +	spin_unlock_irqrestore(&object->lock, flags);
> +	put_object(object);
> +}
> +
> +/*
> + * Add a scanning area to the object.
> + */
> +static inline void add_scan_area(unsigned long ptr, unsigned long offset, size_t length)

That comment isn't terribly useful ;)

Perhaps it could be enhanced a bit to tell us what a "scanning area"
is, etc.

> +{
> +	unsigned long flags;
> +	struct memleak_object *object;
> +	struct memleak_scan_area *area;
> +
> +	object = find_and_get_object(ptr, 0);
> +	if (!object) {
> +		dump_stack();
> +		panic("kmemleak: adding scan area to unknown object at 0x%08lx\n", ptr);
> +	}
> +
> +	area = kmem_cache_alloc(scan_area_cache, GFP_ATOMIC);
> +	if (!area)
> +		panic("kmemleak: cannot allocate a scan area\n");

ow, ow, ow.

> +	spin_lock_irqsave(&object->lock, flags);
> +	if (offset + length > object->size) {
> +		dump_stack();
> +		dump_object_info(object);
> +		panic("kmemleak: scan area larger than object 0x%08lx\n", ptr);
> +	}
> +
> +	INIT_HLIST_NODE(&area->node);
> +	area->offset = offset;
> +	area->length = length;
> +
> +	hlist_add_head(&area->node, &object->area_list);
> +	spin_unlock_irqrestore(&object->lock, flags);
> +	put_object(object);
> +}
> +
> +/*
> + * Log the early memleak_* calls.
> + */
> +static void __init memleak_early_log(int op_type, const void *ptr, size_t size,
> +				     int ref_count,
> +				     unsigned long offset, size_t length)
> +{
> +	unsigned long flags;
> +	struct early_log *log;
> +
> +	if (crt_early_log >= EARLY_LOG_NR)
> +		panic("kmemleak: early log buffer exceeded\n");
> +
> +	local_irq_save(flags);
> +	log = &early_log[crt_early_log];
> +	log->op_type = op_type;
> +	log->ptr = ptr;
> +	log->size = size;
> +	log->ref_count = ref_count;
> +	log->offset = offset;
> +	log->length = length;
> +	crt_early_log++;
> +	local_irq_restore(flags);
> +}
> +
> +/*
> + * Allocation function hook.
> + */
> +void memleak_alloc(const void *ptr, size_t size, int ref_count)
> +{
> +	pr_debug("%s(0x%p, %zu, %d)\n", __FUNCTION__, ptr, size, ref_count);
> +
> +	if (!atomic_read(&memleak_enabled)) {
> +		memleak_early_log(MEMLEAK_ALLOC, ptr, size, ref_count, 0, 0);
> +		return;
> +	}
> +	if (!ptr)
> +		return;
> +
> +	create_object((unsigned long)ptr, size, ref_count);
> +}
> +EXPORT_SYMBOL_GPL(memleak_alloc);

It would be nice to have some description of what all this early log
stuff _is_, and why it exists.  I can kinda guess, but I'd prefer not
to have to do so - guessing is unreliable.

> +/*
> + * Freeing function hook.
> + */
> +void memleak_free(const void *ptr)
> +{
> +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> +
> +	if (!atomic_read(&memleak_enabled)) {
> +		memleak_early_log(MEMLEAK_FREE, ptr, 0, 0, 0, 0);
> +		return;
> +	}
> +	if (!ptr)
> +		return;
> +
> +	delete_object((unsigned long)ptr);
> +}
> +EXPORT_SYMBOL_GPL(memleak_free);
> +
> +/*
> + * Mark a object as a false positive.
> + */
> +void memleak_not_leak(const void *ptr)
> +{
> +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> +
> +	if (!atomic_read(&memleak_enabled)) {
> +		memleak_early_log(MEMLEAK_NOT_LEAK, ptr, 0, 0, 0, 0);
> +		return;
> +	}
> +	if (!ptr)
> +		return;
> +
> +	make_gray_object((unsigned long)ptr);
> +}
> +EXPORT_SYMBOL(memleak_not_leak);

It would be appropriate to have some discussion somewhere describing
what a false positive is, and how it can come about, and how kmemleak
handles them.

Perhaps that's in the documentation somewhere.

> +/*
> + * Ignore this memory object.
> + */
> +void memleak_ignore(const void *ptr)
> +{
> +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> +
> +	if (!atomic_read(&memleak_enabled)) {
> +		memleak_early_log(MEMLEAK_IGNORE, ptr, 0, 0, 0, 0);
> +		return;
> +	}
> +	if (!ptr)
> +		return;
> +
> +	make_black_object((unsigned long)ptr);
> +}
> +EXPORT_SYMBOL(memleak_ignore);
> +
> +/*
> + * Add a scanning area to an object.

This comment exactly duplicates the one over add_scan_area().  copy-n-paste?

> + */
> +void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
> +{
> +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> +
> +	if (!atomic_read(&memleak_enabled)) {
> +		memleak_early_log(MEMLEAK_SCAN_AREA, ptr, 0, 0, offset, length);
> +		return;
> +	}
> +	if (!ptr)
> +		return;
> +
> +	add_scan_area((unsigned long)ptr, offset, length);
> +}
> +EXPORT_SYMBOL(memleak_scan_area);
> +
> +static inline void scan_yield(void)
> +{
> +	BUG_ON(in_atomic());

Please don't use in_atomic().  It's a low-level internal thing, the
results of which vary according to kernel configuration.  checkpatch
does/will/should warn about this.

Using might_sleep() would be appropriate here.

> +	if (time_is_before_eq_jiffies(next_scan_yield)) {
> +		schedule();
> +		next_scan_yield = jiffies + jiffies_scan_yield;
> +	}

Why is the yielding rate-limited?  The code needs a comment explaining
this, because the reader cannot tell.

> +	if (signal_pending(current))
> +		scan_should_stop = 1;

That looks odd.  per-task signal state will change kernel-wide state?

You can tell that I don't understand how all this code actually works. 
It would be nice if I _could_ tell that, from the code and its comments.

<looks around a bit>

Ah, this is run from a kernel thread?  Why is kthread_should_stop()
insufficient?

> +}
> +
> +/*
> + * Scan a block of memory (exclusive range) for pointers and move
> + * those found to the gray list.
> + */
> +static void scan_block(void *_start, void *_end, struct memleak_object *scanned)
> +{
> +	unsigned long *ptr;
> +	unsigned long *start = PTR_ALIGN(_start, BYTES_PER_WORD);
> +	unsigned long *end = _end - (BYTES_PER_WORD - 1);
> +
> +	for (ptr = start; ptr < end; ptr++) {
> +		unsigned long flags;
> +		unsigned long pointer = *ptr;
> +		struct memleak_object *object;
> +
> +		if (scan_should_stop)
> +			break;
> +
> +		/* If scanned, the CPU is yielded in the calling code */
> +		if (!scanned)
> +			scan_yield();
> +
> +		/*
> +		 * The boundaries check doesn't need to be precise
> +		 * (hence no locking) since orphan objects need to
> +		 * pass a scanning threshold before being reported.
> +		 */
> +		if (pointer < min_addr || pointer >= max_addr)
> +			continue;
> +
> +		object = find_and_get_object(pointer, 1);
> +		if (!object)
> +			continue;
> +		if (object == scanned) {
> +			/* self referenced */
> +			put_object(object);
> +			continue;
> +		}
> +
> +		/*
> +		 * Avoid the lockdep recursive warning on object->lock
> +		 * being previously acquired in scan_object(). These
> +		 * locks are enclosed by scan_mutex.
> +		 */
> +		spin_lock_irqsave_nested(&object->lock, flags, SINGLE_DEPTH_NESTING);
> +		if (!color_white(object)) {
> +			/* non-orphan, ignored or new */
> +			spin_unlock_irqrestore(&object->lock, flags);
> +			put_object(object);
> +			continue;
> +		}
> +
> +		object->count++;
> +		if (color_gray(object))
> +			/* the object became gray, add it to the list */
> +			list_add_tail(&object->gray_list, &gray_list);
> +		else
> +			put_object(object);
> +		spin_unlock_irqrestore(&object->lock, flags);
> +	}
> +}
> +
> +/*
> + * Scan a memory block represented by a memleak_object.
> + */
> +static inline void scan_object(struct memleak_object *object)
> +{
> +	struct memleak_scan_area *area;
> +	struct hlist_node *elem;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&object->lock, flags);
> +
> +	/* freed object */
> +	if (!(object->flags & OBJECT_ALLOCATED))
> +		goto out;
> +
> +	if (hlist_empty(&object->area_list))
> +		scan_block((void *)object->pointer,
> +			   (void *)(object->pointer + object->size), object);
> +	else
> +		hlist_for_each_entry(area, elem, &object->area_list, node)
> +			scan_block((void *)(object->pointer + area->offset),
> +				   (void *)(object->pointer + area->offset
> +					    + area->length), object);
> +
> + out:
> +	spin_unlock_irqrestore(&object->lock, flags);
> +}
> +
> +/*
> + * Scan the memory and print the orphan objects.
> + */
> +static void memleak_scan(void)
> +{
> +	unsigned long flags;
> +	struct memleak_object *object, *tmp;
> +	int i;
> +#ifdef SCAN_TASK_STACKS
> +	struct task_struct *task;
> +#endif
> +
> +	scan_should_stop = 0;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(object, &object_list, object_list) {
> +		spin_lock_irqsave(&object->lock, flags);
> +#ifdef DEBUG
> +		/*
> +		 * With a few exceptions there should be a maximum of
> +		 * 1 reference to any object at this point.
> +		 */
> +		if (atomic_read(&object->use_count) > 1) {
> +			pr_debug("kmemleak: object->use_count = %d\n",
> +				 atomic_read(&object->use_count));
> +			dump_object_info(object);
> +		}
> +#endif
> +		/* reset the reference count (whiten the object) */
> +		object->count = 0;
> +		if (color_gray(object) && get_object(object))
> +			list_add_tail(&object->gray_list, &gray_list);
> +
> +		spin_unlock_irqrestore(&object->lock, flags);
> +	}
> +	rcu_read_unlock();
> +
> +	/* data/bss scanning */
> +	scan_block(_sdata, _edata, NULL);
> +	scan_block(__bss_start, __bss_stop, NULL);
> +
> +#ifdef CONFIG_SMP
> +	/* per-cpu scanning */
> +	for_each_possible_cpu(i)
> +		scan_block(__per_cpu_start + per_cpu_offset(i),
> +			   __per_cpu_end + per_cpu_offset(i), NULL);
> +#endif
> +
> +	/* mem_map scanning */
> +	for_each_online_node(i) {
> +		struct page *page, *end;
> +
> +		page = NODE_MEM_MAP(i);
> +		end  = page + NODE_DATA(i)->node_spanned_pages;
> +
> +		scan_block(page, end, NULL);
> +	}

hmm, do we really need to go this low into the mm data structures?

What assumptions are being made about the contiguity of a node's memory
here?

Perhaps this code should be using pfns and pfn_valid(), dunno.

> +#ifdef SCAN_TASK_STACKS
> +	read_lock(&tasklist_lock);
> +	for_each_process(task)
> +		scan_block(task_stack_page(task),
> +			   task_stack_page(task) + THREAD_SIZE, NULL);
> +	read_unlock(&tasklist_lock);
> +#endif
> +
> +	/*
> +	 * Scan the objects already referenced. More objects will be
> +	 * referenced and, if there are no memory leaks, all the
> +	 * objects will be scanned. The list traversal is safe for
> +	 * both tail additions and removals from inside the loop. The
> +	 * memleak objects cannot be freed from outside the loop
> +	 * because their use_count was increased.
> +	 */
> +	object = list_entry(gray_list.next, typeof(*object), gray_list);
> +	while (&object->gray_list != &gray_list) {
> +		scan_yield();
> +
> +		/* may add new objects to the list */
> +		if (!scan_should_stop)
> +			scan_object(object);
> +
> +		tmp = list_entry(object->gray_list.next, typeof(*object),
> +				 gray_list);
> +
> +		/* remove the object from the list and release it */
> +		list_del(&object->gray_list);
> +		put_object(object);
> +
> +		object = tmp;
> +	}
> +	BUG_ON(!list_empty(&gray_list));
> +}
> +
> +static void *memleak_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	struct memleak_object *object;
> +	loff_t n = *pos;
> +
> +	if (!n) {
> +		memleak_scan();
> +		reported_leaks = 0;
> +	}
> +	if (reported_leaks >= REPORTS_NR)
> +		return NULL;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(object, &object_list, object_list) {
> +		if (n-- > 0)
> +			continue;
> +
> +		if (get_object(object))
> +			goto out;
> +	}
> +	object = NULL;
> + out:
> +	rcu_read_unlock();
> +	return object;
> +}
> +
> +static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct list_head *n;
> +	struct memleak_object *next = NULL;
> +	unsigned long flags;
> +
> +	++(*pos);
> +	if (reported_leaks >= REPORTS_NR)
> +		goto out;
> +
> +	spin_lock_irqsave(&object_list_lock, flags);
> +	n = ((struct memleak_object *)v)->object_list.next;

Now what's going on here.  Can this be cleaned up?  container_of(),
list_entry(), etc?

> +	if (n != &object_list) {
> +		next = list_entry(n, struct memleak_object, object_list);
> +		get_object(next);
> +	}
> +	spin_unlock_irqrestore(&object_list_lock, flags);
> +
> + out:
> +	put_object(v);
> +	return next;
> +}
> +
> +static void memleak_seq_stop(struct seq_file *seq, void *v)
> +{
> +	if (v)

Can `v' be NULL here?  Is seq_stop() actually called if seq_start()
returned NULL?

> +		put_object(v);
> +}
> +
> +static int memleak_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct memleak_object *object = v;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&object->lock, flags);
> +	if (!unreferenced_object(object))
> +		goto out;
> +	print_unreferenced(seq, object);
> +	reported_leaks++;

I'm scratching my head over what this does but alas, global variable
reported_leaks is undocumented.

> +out:
> +	spin_unlock_irqrestore(&object->lock, flags);
> +	return 0;
> +}
> +
> +static struct seq_operations memleak_seq_ops = {

Can be const.

> +	.start = memleak_seq_start,
> +	.next  = memleak_seq_next,
> +	.stop  = memleak_seq_stop,
> +	.show  = memleak_seq_show,
> +};
> +
> +static int memleak_seq_open(struct inode *inode, struct file *file)
> +{
> +	int ret = mutex_lock_interruptible(&scan_mutex);
> +	if (ret < 0)
> +		return ret;
> +	ret = seq_open(file, &memleak_seq_ops);
> +	if (ret < 0)
> +		mutex_unlock(&scan_mutex);
> +	return ret;
> +}
>
> +static int memleak_seq_release(struct inode *inode, struct file *file)
> +{
> +	int ret = seq_release(inode, file);
> +	mutex_unlock(&scan_mutex);
> +	return ret;
> +}
> +
> +static struct file_operations memleak_fops = {
	
const
	
> +	.owner	 = THIS_MODULE,
> +	.open    = memleak_seq_open,
> +	.read    = seq_read,
> +	.llseek  = seq_lseek,
> +	.release = memleak_seq_release,
> +};
> +
> +static int memleak_scan_thread(void *arg)
> +{
> +	/* sleep before the first scan */

The reader wonders why...

> +	ssleep(SECS_FIRST_SCAN);
> +
> +	while (!kthread_should_stop()) {
> +		struct memleak_object *object;
> +
> +		mutex_lock(&scan_mutex);
> +
> +		memleak_scan();
> +		reported_leaks = 0;
> +
> +		rcu_read_lock();
> +		list_for_each_entry_rcu(object, &object_list, object_list) {
> +			unsigned long flags;
> +
> +			if (reported_leaks >= REPORTS_NR)
> +				break;
> +			spin_lock_irqsave(&object->lock, flags);
> +			if (!(object->flags & OBJECT_REPORTED) &&
> +			    unreferenced_object(object)) {
> +				print_unreferenced(NULL, object);
> +				object->flags |= OBJECT_REPORTED;
> +				reported_leaks++;
> +			}
> +			spin_unlock_irqrestore(&object->lock, flags);
> +		}
> +		rcu_read_unlock();
> +
> +		mutex_unlock(&scan_mutex);
> +		ssleep(SECS_SCAN_PERIOD);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Kmemleak initialization.
> + */
> +void __init memleak_init(void)
> +{
> +	int i;
> +
> +	jiffies_scan_yield = msecs_to_jiffies(MSECS_SCAN_YIELD);
> +	jiffies_min_age = msecs_to_jiffies(MSECS_MIN_AGE);
> +	object_cache = kmem_cache_create("kmemleak_object",
> +					 sizeof(struct memleak_object),
> +					 0, SLAB_NOLEAKTRACE, NULL);
> +	scan_area_cache = kmem_cache_create("kmemleak_scan_area",
> +					    sizeof(struct memleak_scan_area),
> +					    0, SLAB_NOLEAKTRACE, NULL);

We have a little KMEM_CACHE() macro.

> +	INIT_PRIO_TREE_ROOT(&object_tree_root);
> +
> +	/*
> +	 * This is the point where tracking allocations is safe.
> +	 * Scanning is only available later.
> +	 */
> +	atomic_set(&memleak_enabled, 1);
> +
> +	/* execute the early logged operations */
> +	for (i = 0; i < crt_early_log; i++) {
> +		struct early_log *log = &early_log[i];
> +
> +		switch (log->op_type) {
> +		case MEMLEAK_ALLOC:
> +			memleak_alloc(log->ptr, log->size, log->ref_count);
> +			break;
> +		case MEMLEAK_FREE:
> +			memleak_free(log->ptr);
> +			break;
> +		case MEMLEAK_NOT_LEAK:
> +			memleak_not_leak(log->ptr);
> +			break;
> +		case MEMLEAK_IGNORE:
> +			memleak_ignore(log->ptr);
> +			break;
> +		case MEMLEAK_SCAN_AREA:
> +			memleak_scan_area(log->ptr, log->offset, log->length);
> +			break;
> +		default:
> +			BUG();
> +		}
> +	}
> +}
> +
> +/*
> + * Late initialization function.
> + */
> +static int __init memleak_late_init(void)
> +{
> +	struct dentry *dentry;
> +
> +	dentry = debugfs_create_file("memleak", S_IRUGO, NULL, NULL,
> +				     &memleak_fops);
> +	if (!dentry)
> +		return -ENOMEM;
> +
> +	scan_thread = kthread_run(memleak_scan_thread, NULL, "kmemleak");
> +	if (IS_ERR(scan_thread))
> +		pr_warning("kmemleak: Failed to initialise the scan thread\n");
> +
> +	pr_info("Kernel memory leak detector initialized\n");
> +
> +	return 0;
> +}
> +late_initcall(memleak_late_init);


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-02 21:28   ` Andrew Morton
@ 2008-12-04 18:50     ` Catalin Marinas
  2008-12-04 19:03       ` Andrew Morton
  2008-12-18  9:28     ` Catalin Marinas
  1 sibling, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-12-04 18:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

Hi Andrew,

Thanks for your comments. There are some clarifications below (I'm OK
with the rest of your comments).

On Tue, 2008-12-02 at 13:28 -0800, Andrew Morton wrote:
> On Sat, 29 Nov 2008 10:43:12 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > @@ -653,6 +654,8 @@ asmlinkage void __init start_kernel(void)
> >  	enable_debug_pagealloc();
> >  	cpu_hotplug_init();
> >  	kmem_cache_init();
> > +	prio_tree_init();
> > +	memleak_init();
> >  	debug_objects_mem_init();
> >  	idr_init_cache();
> >  	setup_per_cpu_pageset();
> > @@ -662,7 +665,6 @@ asmlinkage void __init start_kernel(void)
> >  	calibrate_delay();
> >  	pidmap_init();
> >  	pgtable_cache_init();
> > -	prio_tree_init();
> 
> Yeah.
> 
> prio_tree_init() could be done at compile-time, even.

I can't figure out how to do it in a readable way and dependent on
BITS_PER_LONG.

> > +#define MSECS_SCAN_YIELD	10
> > +#define SECS_FIRST_SCAN		60
> > +#define SECS_SCAN_PERIOD	600
> > +
> > +/* scanning area inside a memory block */
> > +struct memleak_scan_area {
> > +	struct hlist_node node;
> > +	unsigned long offset;
> > +	size_t length;
> > +};
> > +
> > +/* the main allocation tracking object */
> > +struct memleak_object {
> > +	spinlock_t lock;
> > +	unsigned long flags;
> > +	struct list_head object_list;
> > +	struct list_head gray_list;
> > +	struct prio_tree_node tree_node;
> > +	struct rcu_head rcu;		/* used for object_list lockless traversal */
> > +	atomic_t use_count;		/* internal usage count */
> > +	unsigned long pointer;
> > +	size_t size;
> > +	int ref_count;			/* the minimum encounters of the pointer */
> > +	int count;			/* the ecounters of the pointer */
> 
> Was that supposed to read "encounters"?
> 
> The reader of the above documentation will be wondering what an
> "encounter" is.
> 
> `count', `refcount' and `use_count' is getting confusing...

OK, I changed the comments and remove the encounters (which might be a
wrong interpretation of this word).

> > +	struct hlist_head area_list;	/* areas to be scanned (or empty for all) */
> > +	unsigned long trace[MAX_TRACE];
> > +	unsigned int trace_len;
> > +	unsigned long jiffies;		/* creation timestamp */
> > +	pid_t pid;			/* pid of the current task */
> 
> Two tasks in different pid namespaces can have the same pid.  What
> effect will that have upon kmemleak?

It won't have any effect. The pid is just an additional information
displayed when reporting an unreferenced object.

> > +	char comm[TASK_COMM_LEN];	/* executable name */
> > +};
> > +
> > +/* The list of all allocated objects */
> > +static LIST_HEAD(object_list);
> > +static DEFINE_SPINLOCK(object_list_lock);
> > +/* The list of the gray objects */
> > +static LIST_HEAD(gray_list);
> 
> It would be nice to briefly define the term "gray" before using it.

Yes, it needs a better comment (the terminology is known in the garbage
collectors world).

> > +/* prio search tree for object boundaries */
> > +static struct prio_tree_root object_tree_root;
> > +static DEFINE_RWLOCK(object_tree_lock);
> > +
> > +/* allocation caches */
> > +static struct kmem_cache *object_cache;
> > +static struct kmem_cache *scan_area_cache;
> > +
> > +static atomic_t memleak_enabled = ATOMIC_INIT(0);
> > +
> > +/* minimum and maximum address that may be valid pointers */
> > +static unsigned long min_addr = ~0;
> 
> <writes a test program>
> 
> OK, I think you got lucky here.
> 
> "0" is a signed 32-bit integer, having the all-zeroes pattern.
> 
> "~0" is a signed 32-bit integer, having the all-ones bit pattern.
> 
> When that gets assigned to an unsigned long it gets sign-extended
> first, so you indeed ended up with the 64-bit all-ones pattern.
> 
> All very tricky!  I like to use plain old "-1" for the all-ones pattern
> and let the compiler do the work.  It always works.

It might be even better to use ULONG_MAX here.

> > +static unsigned long max_addr;
> > +
> > +/* used for yielding the CPU to other tasks during scanning */
> > +static unsigned long next_scan_yield;
> > +static struct task_struct *scan_thread;
> > +static DEFINE_MUTEX(scan_mutex);
> > +static int reported_leaks;
> > +static int scan_should_stop;
> > +
> > +static unsigned long jiffies_scan_yield;
> > +static unsigned long jiffies_min_age;
> > +
> > +/*
> > + * Early object allocation (before kmemleak is fully initialised).
> > + */
> > +#define EARLY_LOG_NR		200
> > +
> > +enum {
> > +	MEMLEAK_ALLOC,
> > +	MEMLEAK_FREE,
> > +	MEMLEAK_NOT_LEAK,
> > +	MEMLEAK_IGNORE,
> > +	MEMLEAK_SCAN_AREA,
> > +};
> 
> It looks to me like the above states are pretty important to
> understanding the code.  Perhaps they each should be documented?

I agree, the early logging part needs commenting. The above states are
just the callbacks which cannot be handled because memleak_init() wasn't
called yet. Once kmemleak is initialised, it goes through the array and
calls the corresponding memleak_alloc/memleak_free/... functions.

> > +struct early_log {
> > +	int op_type;			/* kmemleak operation type */
> > +	const void *ptr;
> > +	size_t size;
> > +	int ref_count;
> > +	unsigned long offset;
> > +	size_t length;
> > +};
> 
> Undocumented?  What's it for, and what do the fields do?

I'll add comments. The members are just arguments passed to the kmemleak
callbacks that need to be stored before kmemleak is fully initialised.
Maybe some other name like early_buffer would be more meaningful.

> > +{
> > +	unsigned long flags;
> > +	struct memleak_object *object;
> > +	struct prio_tree_node *node;
> > +	struct stack_trace trace;
> > +
> > +	object = kmem_cache_alloc(object_cache, GFP_ATOMIC);
> > +	if (!object)
> > +		panic("kmemleak: cannot allocate a memleak_object structure\n");
> 
> That's a big problem, isn't it?  GFP_ATOMIC allocations are quite
> unreliable, and we just nuked the machine?

That's still on the to-do list. The first step I did, as suggested by
others (I think Pekka), was to use the same flags as those used by the
code triggering the callback. This way we don't need to always have
GFP_ATOMIC.

The next step is to change most of the panic/BUG stuff in the code to
something more friendly like disabling kmemleak rather than locking the
machine.

> > +	/*
> > +	 * Update the boundaries before inserting the object in the
> > +	 * prio search tree.
> > +	 */
> > +	smp_mb();
> 
> hm.  Is that needed?  I hope not, assuming that readers are taking
> read_lock(object_tree_lock).

The reader of the min_addr/max_addr variables isn't taking the
object_tree_lock at that point. It does a simple boundary check and, if
this check succeeds, it look-up the object_tree (with
read_lock(object_tree_lock)). My thinking here was that the boundaries
update might not become visible before the the object_tree update. And
adding locking around these boundaries looks too heavyweight, especially
when we have to check a huge amount of values during scanning.

Now, I don't think it really matters whether the boundaries are visible
or not because we could see the boundaries udpated and do the
read_lock(object_tree_lock) before he thread updating the boundaries
gets the chance to do write_lock(object_tree_lock).

There may be another hypothetical situation in which the boundaries will
never become visible to other CPUs unless a memory barrier is used. But
the write_unlock following the min_addr/max_addr update already has such
barriers.

So, I think this barrier can be safely removed.

> > +{
> > +	unsigned long flags;
> > +	struct memleak_object *object;
> > +
> > +	write_lock_irqsave(&object_tree_lock, flags);
> > +	object = lookup_object(ptr, 0);
> > +	if (!object) {
> > +		pr_warning("kmemleak: freeing unknown object at 0x%08lx\n", ptr);
> > +		dump_stack();
> > +		write_unlock_irqrestore(&object_tree_lock, flags);
> > +		return;
> > +	}
> > +	prio_tree_remove(&object_tree_root, &object->tree_node);
> > +	write_unlock_irqrestore(&object_tree_lock, flags);
> > +
> > +	BUG_ON(!(object->flags & OBJECT_ALLOCATED));
> > +
> > +	spin_lock_irqsave(&object->lock, flags);
> > +	object->flags &= ~OBJECT_ALLOCATED;
> > +#ifdef REPORT_ORPHAN_FREEING
> > +	if (color_white(object)) {
> > +		pr_warning("kmemleak: freeing orphan object 0x%08lx\n", ptr);
> > +		dump_stack();
> > +		dump_object_info(object);
> > +	}
> > +#endif
> > +	object->pointer = 0;
> > +	spin_unlock_irqrestore(&object->lock, flags);
> > +
> > +	put_object(object);
> 
> Seems strange to take the object's internal lock if we're about to
> delete it - nobody else should be using it anyway?

Not really, we may call the delete_object() function during a scanning
episode when the object->use_count could be higher and put_object()
wouldn't actually free it unless use_count becomes 0.

> > +{
> > +	unsigned long flags;
> > +	struct memleak_object *object;
> > +	struct memleak_scan_area *area;
> > +
> > +	object = find_and_get_object(ptr, 0);
> > +	if (!object) {
> > +		dump_stack();
> > +		panic("kmemleak: adding scan area to unknown object at 0x%08lx\n", ptr);
> > +	}
> > +
> > +	area = kmem_cache_alloc(scan_area_cache, GFP_ATOMIC);
> > +	if (!area)
> > +		panic("kmemleak: cannot allocate a scan area\n");
> 
> ow, ow, ow.

Yes, I'll sort them out.

> > +/*
> > + * Mark a object as a false positive.
> > + */
> > +void memleak_not_leak(const void *ptr)
> > +{
> > +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> > +
> > +	if (!atomic_read(&memleak_enabled)) {
> > +		memleak_early_log(MEMLEAK_NOT_LEAK, ptr, 0, 0, 0, 0);
> > +		return;
> > +	}
> > +	if (!ptr)
> > +		return;
> > +
> > +	make_gray_object((unsigned long)ptr);
> > +}
> > +EXPORT_SYMBOL(memleak_not_leak);
> 
> It would be appropriate to have some discussion somewhere describing
> what a false positive is, and how it can come about, and how kmemleak
> handles them.
> 
> Perhaps that's in the documentation somewhere.

There is Documentation/kmemleak.txt and it says something about false
positives/negatives but it can be improved.

> > + */
> > +void memleak_scan_area(const void *ptr, unsigned long offset, size_t length)
> > +{
> > +	pr_debug("%s(0x%p)\n", __FUNCTION__, ptr);
> > +
> > +	if (!atomic_read(&memleak_enabled)) {
> > +		memleak_early_log(MEMLEAK_SCAN_AREA, ptr, 0, 0, offset, length);
> > +		return;
> > +	}
> > +	if (!ptr)
> > +		return;
> > +
> > +	add_scan_area((unsigned long)ptr, offset, length);
> > +}
> > +EXPORT_SYMBOL(memleak_scan_area);
> > +
> > +static inline void scan_yield(void)
> > +{
> > +	BUG_ON(in_atomic());
> 
> Please don't use in_atomic().  It's a low-level internal thing, the
> results of which vary according to kernel configuration.  checkpatch
> does/will/should warn about this.

It didn't.

> Using might_sleep() would be appropriate here.

OK.

> > +	if (time_is_before_eq_jiffies(next_scan_yield)) {
> > +		schedule();
> > +		next_scan_yield = jiffies + jiffies_scan_yield;
> > +	}
> 
> Why is the yielding rate-limited?  The code needs a comment explaining
> this, because the reader cannot tell.

I'll add a comment. It is rate-limited to avoid calling schedule() after
every pointer checked. It would be really time consuming.

> > +	if (signal_pending(current))
> > +		scan_should_stop = 1;
> 
> That looks odd.  per-task signal state will change kernel-wide state?
> 
> You can tell that I don't understand how all this code actually works. 
> It would be nice if I _could_ tell that, from the code and its comments.
> 
> <looks around a bit>
> 
> Ah, this is run from a kernel thread?  Why is kthread_should_stop()
> insufficient?

The memleak_scan() function (and the subsequently called scan_block) can
be invoked from either a kernel thread or from a user process when
reading the /sys/kernel/debug/memleak file. The signal_pending() is
intended for the latter case. Scanning takes considerable amount of time
(minutes) and the user might decide to stop this, hence the
scan_should_stop = 1 and the scan_block() function will avoid any
subsequent pointer look-ups.

The kthread_should_stop() is called only outside the memleak_scan()
function. For the kthread, there is currently no way to stop it in the
middle of a scan.

An alternative would be to differentiate the scan_yield() checks
(process or kthread) with an additional argument that gets passed via
memleak_scan(). Or, is there a way to tell whether it's in a kernel
thread or user process context? Checking current->mm?

> > +	/* mem_map scanning */
> > +	for_each_online_node(i) {
> > +		struct page *page, *end;
> > +
> > +		page = NODE_MEM_MAP(i);
> > +		end  = page + NODE_DATA(i)->node_spanned_pages;
> > +
> > +		scan_block(page, end, NULL);
> > +	}
> 
> hmm, do we really need to go this low into the mm data structures?
> 
> What assumptions are being made about the contiguity of a node's memory
> here?
> 
> Perhaps this code should be using pfns and pfn_valid(), dunno.

I'll have to review this but there are several objects referred via the
"struct page" in the mem_map. So it only scans these structures
otherwise there would be plenty of false positives.

> > +static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +	struct list_head *n;
> > +	struct memleak_object *next = NULL;
> > +	unsigned long flags;
> > +
> > +	++(*pos);
> > +	if (reported_leaks >= REPORTS_NR)
> > +		goto out;
> > +
> > +	spin_lock_irqsave(&object_list_lock, flags);
> > +	n = ((struct memleak_object *)v)->object_list.next;
> 
> Now what's going on here.  Can this be cleaned up?  container_of(),
> list_entry(), etc?

I'll look at it. I mainly did what I saw in other parts of the kernel.

> > +static int memleak_scan_thread(void *arg)
> > +{
> > +	/* sleep before the first scan */
> 
> The reader wonders why...

Because Ingo asked for it :-). It's mainly for his automatic testing.
I'll add a meaningful comment.

Thanks again for your comments. I'll post another set of patches once I
sort these comments out.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-04 18:50     ` Catalin Marinas
@ 2008-12-04 19:03       ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2008-12-04 19:03 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, mingo

On Thu, 04 Dec 2008 18:50:46 +0000 Catalin Marinas <catalin.marinas@arm.com> wrote:

> Or, is there a way to tell whether it's in a kernel
> thread or user process context? Checking current->mm?

Yup, that'll work.  There might be small windows during process startup
and termination where a real process can have a NULL ->mm, but those
are probably benign in this case.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] kmemleak: Add the base support
  2008-12-02 21:28   ` Andrew Morton
  2008-12-04 18:50     ` Catalin Marinas
@ 2008-12-18  9:28     ` Catalin Marinas
  1 sibling, 0 replies; 35+ messages in thread
From: Catalin Marinas @ 2008-12-18  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo

On Tue, 2008-12-02 at 13:28 -0800, Andrew Morton wrote:
> On Sat, 29 Nov 2008 10:43:12 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > +	INIT_LIST_HEAD(&object->object_list);
> > +	INIT_LIST_HEAD(&object->gray_list);
> > +	INIT_HLIST_HEAD(&object->area_list);
> > +	spin_lock_init(&object->lock);
> > +	atomic_set(&object->use_count, 1);
> > +	object->flags = OBJECT_ALLOCATED;
> > +	object->pointer = ptr;
> > +	object->size = size;
> > +	object->ref_count = ref_count;
> > +	object->count = -1;				/* black color initially */
> > +	object->jiffies = jiffies;
> > +	if (in_irq()) {
> > +		object->pid = 0;
> > +		strncpy(object->comm, "hardirq", TASK_COMM_LEN);
> > +	} else if (in_softirq()) {
> > +		object->pid = 0;
> > +		strncpy(object->comm, "softirq", TASK_COMM_LEN);
> > +	} else {
> > +		object->pid = current->pid;
> > +		strncpy(object->comm, current->comm, TASK_COMM_LEN);
> 
> Access to current->comm is a teeny but racy.  Use get_task_comm() here.

This seems to cause some problems. First of all, the IRQs aren't
necessarily disabled and get_task_comm() acquires current->alloc_lock
without disabling the IRQs (I could add local_irq_save/restore around
it). The alloc_lock comment in task_struct also states that the lock
protects the (de)allocation of some members which may imply that the
lock could be held when kmemleak is called.

The other issue which I couldn't completely figure out is a lockdep
warning caused by calling get_task_comm() in kmemleak:

Starting the hotplug events dispatcher: udevd                                   
======================================================                          
[ INFO: soft-safe -> soft-unsafe lock order detected ]                          
2.6.28-rc8-00074-g1134084-dirty #158                                            
------------------------------------------------------                          
udevd/350 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:                         
 (&p->alloc_lock){--..}, at: [<c008a000>] get_task_comm+0x20/0x40               
                                                                                
and this task is already holding:                                               
 (nl_table_lock){..-?}, at: [<c01c3628>] netlink_table_grab+0x18/0xd0           
which would create a new lock dependency:                                       
 (nl_table_lock){..-?} -> (&p->alloc_lock){--..}                                
                                                                                
but this new dependency connects a soft-irq-safe lock:                          
 (nl_table_lock){..-?}                                                          
... which became soft-irq-safe at:                                              
  [<c0055770>] __lock_acquire+0x51c/0x14b0                                      
  [<c0056768>] lock_acquire+0x64/0x78                                           
  [<c0229690>] _read_lock+0x34/0x44                                             
  [<c01c474c>] netlink_broadcast+0xbc/0x3bc                                     
  [<c01c52f4>] nlmsg_notify+0x6c/0x98                                           
  [<c01bde40>] rtnl_notify+0x44/0x4c                                            
  [<c01ba0f0>] __neigh_notify+0x98/0xcc                                         
  [<c01bb35c>] neigh_update_notify+0x2c/0x30                                    
  [<c01bbad8>] neigh_update+0x350/0x36c                                         
  [<c01ee5ec>] arp_process+0x624/0x6b0                                          
  [<c01ee764>] arp_rcv+0xd8/0xec                                                
  [<c01b34b4>] netif_receive_skb+0x278/0x2bc                                    
  [<c01b3580>] process_backlog+0x88/0x120                                       
  [<c01b1b78>] net_rx_action+0x6c/0x19c                                         
  [<c0039f34>] __do_softirq+0x74/0x124                                          
  [<c003a03c>] irq_exit+0x58/0x60                                               
  [<c0020068>] __exception_text_start+0x68/0x84                                 
  [<c0020a1c>] __irq_svc+0x3c/0x80                                              
  [<c0022360>] cpu_idle+0x38/0x54                                               
  [<c0223a30>] rest_init+0x58/0x6c                                              
  [<c000892c>] start_kernel+0x234/0x27c                                         
  [<70008034>] 0x70008034                                                       
  [<ffffffff>] 0xffffffff                                                       
                                                                                
to a soft-irq-unsafe lock:                                                      
 (&p->alloc_lock){--..}                                                         
... which became soft-irq-unsafe at:                                            
...  [<c0055800>] __lock_acquire+0x5ac/0x14b0                                   
  [<c0056768>] lock_acquire+0x64/0x78                                           
  [<c02292e0>] _spin_lock+0x3c/0x4c                                             
  [<c0089944>] set_task_comm+0x20/0x3c                                          
  [<c0049774>] kthreadd+0x2c/0x174                                              
  [<c0037db0>] do_exit+0x0/0x6ec                                                
  [<ffffffff>] 0xffffffff                                                       

[ ... removed some other info ... ]

stack backtrace:                                                                
[<c0226594>] (dump_stack+0x0/0x14) from [<c00551f0>] (check_usage+0x34c/0x3b0)  
[<c0054ea4>] (check_usage+0x0/0x3b0) from [<c00560ac>] (__lock_acquire+0xe58/0x1
4b0)                                                                            
[<c0055254>] (__lock_acquire+0x0/0x14b0) from [<c0056768>] (lock_acquire+0x64/0x
78)                                                                             
[<c0056704>] (lock_acquire+0x0/0x78) from [<c02292e0>] (_spin_lock+0x3c/0x4c)   
 r7:df4cebd8 r6:df4cebd8 r5:df41328c r4:df41328c                                
[<c02292a4>] (_spin_lock+0x0/0x4c) from [<c008a000>] (get_task_comm+0x20/0x40)  
 r4:df413000                                                                    
[<c0089fe0>] (get_task_comm+0x0/0x40) from [<c00828a8>] (kmemleak_alloc+0x134/0x
2b0)                                                                            
 r7:df4cebd8 r6:df4ceb28 r5:df41c000 r4:40000093                                
[<c0082774>] (kmemleak_alloc+0x0/0x2b0) from [<c0080fc0>] (__kmalloc_node+0xac/0
xb8)                                                                            
[<c0080f14>] (__kmalloc_node+0x0/0xb8) from [<c006f88c>] (__krealloc+0x4c/0x74) 
 r7:00000020 r6:00000004 r5:00000000 r4:00000000                                
[<c006f840>] (__krealloc+0x0/0x74) from [<c006f8dc>] (krealloc+0x28/0x48)       
 r7:df41df00 r6:00000020 r5:00000000 r4:00000004                                
[<c006f8b4>] (krealloc+0x0/0x48) from [<c01c37a4>] (netlink_realloc_groups+0x68/
0xb0)                                                                           
 r5:df4b5270 r4:00000004                                                        
[<c01c373c>] (netlink_realloc_groups+0x0/0xb0) from [<c01c4320>] (netlink_bind+0
x6c/0x148)                                                                      
 r7:df41df00 r6:df4b5270 r5:bef07cc4 r4:df4b5000                                
[<c01c42b4>] (netlink_bind+0x0/0x148) from [<c01a61a0>] (sys_bind+0x6c/0x90)    
 r7:df41df00 r6:0000000c r5:bef07cc4 r4:df4b5000                                
[<c01a6134>] (sys_bind+0x0/0x90) from [<c0020dc0>] (ret_fast_syscall+0x0/0x2c)  
 r7:0000011a r6:bef07cc4 r5:00000004 r4:000204f4                                

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-18 15:28         ` Catalin Marinas
@ 2008-12-18 16:05           ` Pekka Enberg
  0 siblings, 0 replies; 35+ messages in thread
From: Pekka Enberg @ 2008-12-18 16:05 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Christoph Lameter

Hi Catalin,

Catalin Marinas wrote:
> Just to make sure I understand it correctly, the slab_free() fast path
> stores the pointer to the freed object into c->freelist. However, this
> object is no longer tracked by kmemleak because of the
> kmemleak_free_recursive() call at the beginning of this function (false
> negatives make sense only for allocated objects).

Indeed. For SLAB, it's a problem because the per-CPU cache pointer is 
not cleared from the struct array_cache upon _allocation_ which is the 
culprit of the false negative there.

Catalin Marinas wrote:
> Is my understanding correct? Thanks.

Yes, it is and I was just confused. Thanks!

		Pekka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-18 10:51       ` Pekka Enberg
@ 2008-12-18 15:28         ` Catalin Marinas
  2008-12-18 16:05           ` Pekka Enberg
  0 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-12-18 15:28 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, Christoph Lameter

Hi Pekka,

On Thu, 2008-12-18 at 12:51 +0200, Pekka Enberg wrote:
> On Fri, 2008-12-12 at 13:45 +0000, Catalin Marinas wrote:
> > Pekka Enberg wrote:
> > > Hmm, I'm not sure I understand why struct kmem_cache_cpu ->freelist is 
> > > never scanned. 
> >
> > Why would the ->freelist be a problem? I don't fully understand the slub
> > allocator. Aren't objects added to the freelist only after they were
> > freed? In __slab_alloc there seems to be a line:
> > 
> > c->page->freelist = NULL;
> > 
> > so the freelist won't count as a reference anymore. After freeing an
> > object, kmemleak no longer cares about references to it.
> 
> I think we're talking about two different things here. Don't we then
> have false negatives because we reach ->freelist of struct
> kmem_cache_cpu which contains a pointer to an object that is free'd
> (take a look at slab_free() fast-path)?

Just to make sure I understand it correctly, the slab_free() fast path
stores the pointer to the freed object into c->freelist. However, this
object is no longer tracked by kmemleak because of the
kmemleak_free_recursive() call at the beginning of this function (false
negatives make sense only for allocated objects).

On the slab_alloc() fast path, the pointer to an allocated object is
obtained from the c->freelist pointer but this seems to be overridden by
the pointer to the next free object, object[c->offset], which isn't yet
tracked by kmemleak. So, during a memory scan, it shouldn't matter that
the kmem_cache_cpu structures are called as they don't contain any
pointer to an allocated (not free) object.

The new slabs are allocated with alloc_pages() and these are not tracked
by kmemleak.

Is my understanding correct? Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-12 13:45     ` Catalin Marinas
@ 2008-12-18 10:51       ` Pekka Enberg
  2008-12-18 15:28         ` Catalin Marinas
  0 siblings, 1 reply; 35+ messages in thread
From: Pekka Enberg @ 2008-12-18 10:51 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Christoph Lameter

Hi Catalin,

On Fri, 2008-12-12 at 13:45 +0000, Catalin Marinas wrote:
> > Hmm, I'm not sure I understand why struct kmem_cache_cpu ->freelist is 
> > never scanned. 
> 
> Did you get any false positives? Or were you expecting false negatives
> because of freelist scanning which never occurred?

I haven't tested kmemleak so I'm just commenting on the code. I was
thinking about false negatives, not false positives.

On Fri, 2008-12-12 at 13:45 +0000, Catalin Marinas wrote:
> > For SMP, I suppose kmemleak doesn't scan the per-CPU 
> > areas?
> 
> It should scan the per-CPU areas in the memleak_scan() function:
> 
> #ifdef CONFIG_SMP
> 	/* per-cpu sections scanning */
> 	for_each_possible_cpu(i)
> 		scan_block(__per_cpu_start + per_cpu_offset(i),
> 			   __per_cpu_end + per_cpu_offset(i), NULL);
> #endif
> 
> >  But for UP, struct kmem_cache is allocated with kmalloc() and 
> > that contains struct kmem_cache_cpu as well.
> 
> They should be scanned as well.
>
> > And I suppose we never scan struct pages either. Otherwise ->freelist 
> > there would be a problem as well.
> 
> It was scanning the mem_map arrays in the past but removed this part and
> haven't seen any problems (on ARM).
> 
> Why would the ->freelist be a problem? I don't fully understand the slub
> allocator. Aren't objects added to the freelist only after they were
> freed? In __slab_alloc there seems to be a line:
> 
> c->page->freelist = NULL;
> 
> so the freelist won't count as a reference anymore. After freeing an
> object, kmemleak no longer cares about references to it.

I think we're talking about two different things here. Don't we then
have false negatives because we reach ->freelist of struct
kmem_cache_cpu which contains a pointer to an object that is free'd
(take a look at slab_free() fast-path)?

		Pekka


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-11 21:30   ` Pekka Enberg
@ 2008-12-12 13:45     ` Catalin Marinas
  2008-12-18 10:51       ` Pekka Enberg
  0 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-12-12 13:45 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, Christoph Lameter

On Thu, 2008-12-11 at 23:30 +0200, Pekka Enberg wrote:
> Catalin Marinas wrote:
> > This patch adds the callbacks to memleak_(alloc|free) functions from the
> > slub allocator.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Christoph Lameter <cl@linux-foundation.org>
> > Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> Hmm, I'm not sure I understand why struct kmem_cache_cpu ->freelist is 
> never scanned. 

Did you get any false positives? Or were you expecting false negatives
because of freelist scanning which never occurred?

> For SMP, I suppose kmemleak doesn't scan the per-CPU 
> areas?

It should scan the per-CPU areas in the memleak_scan() function:

#ifdef CONFIG_SMP
	/* per-cpu sections scanning */
	for_each_possible_cpu(i)
		scan_block(__per_cpu_start + per_cpu_offset(i),
			   __per_cpu_end + per_cpu_offset(i), NULL);
#endif

>  But for UP, struct kmem_cache is allocated with kmalloc() and 
> that contains struct kmem_cache_cpu as well.

They should be scanned as well.

> And I suppose we never scan struct pages either. Otherwise ->freelist 
> there would be a problem as well.

It was scanning the mem_map arrays in the past but removed this part and
haven't seen any problems (on ARM).

Why would the ->freelist be a problem? I don't fully understand the slub
allocator. Aren't objects added to the freelist only after they were
freed? In __slab_alloc there seems to be a line:

c->page->freelist = NULL;

so the freelist won't count as a reference anymore. After freeing an
object, kmemleak no longer cares about references to it.

-- 
Catalin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-10 18:27 ` [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks Catalin Marinas
@ 2008-12-11 21:30   ` Pekka Enberg
  2008-12-12 13:45     ` Catalin Marinas
  0 siblings, 1 reply; 35+ messages in thread
From: Pekka Enberg @ 2008-12-11 21:30 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, Christoph Lameter

Catalin Marinas wrote:
> This patch adds the callbacks to memleak_(alloc|free) functions from the
> slub allocator.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>

Hmm, I'm not sure I understand why struct kmem_cache_cpu ->freelist is 
never scanned. For SMP, I suppose kmemleak doesn't scan the per-CPU 
areas? But for UP, struct kmem_cache is allocated with kmalloc() and 
that contains struct kmem_cache_cpu as well.

And I suppose we never scan struct pages either. Otherwise ->freelist 
there would be a problem as well.

		Pekka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks
  2008-12-10 18:26 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
@ 2008-12-10 18:27 ` Catalin Marinas
  2008-12-11 21:30   ` Pekka Enberg
  0 siblings, 1 reply; 35+ messages in thread
From: Catalin Marinas @ 2008-12-10 18:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pekka Enberg, Christoph Lameter

This patch adds the callbacks to memleak_(alloc|free) functions from the
slub allocator.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
---
 mm/slub.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 749588a..d9b07cb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -18,6 +18,7 @@
 #include <linux/seq_file.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/memleak.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
 #include <linux/debugobjects.h>
@@ -140,7 +141,7 @@
  * Set of flags that will prevent slab merging
  */
 #define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
-		SLAB_TRACE | SLAB_DESTROY_BY_RCU)
+		SLAB_TRACE | SLAB_DESTROY_BY_RCU | SLAB_NOLEAKTRACE)
 
 #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \
 		SLAB_CACHE_DMA)
@@ -1608,6 +1609,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	if (unlikely((gfpflags & __GFP_ZERO) && object))
 		memset(object, 0, objsize);
 
+	memleak_alloc_recursive(object, objsize, 1, s->flags, gfpflags);
 	return object;
 }
 
@@ -1710,6 +1712,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	unsigned long flags;
 
+	memleak_free_recursive(x, s->flags);
 	local_irq_save(flags);
 	c = get_cpu_slab(s, smp_processor_id());
 	debug_check_no_locks_freed(object, c->objsize);


^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-12-18 16:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-29 10:43 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
2008-11-29 10:43 ` [PATCH 01/15] kmemleak: Add the base support Catalin Marinas
2008-12-01  5:39   ` Yasunori Goto
2008-12-01  6:49     ` Pekka Enberg
2008-12-01  8:11       ` Yasunori Goto
2008-12-01 12:29       ` Catalin Marinas
2008-12-01  7:15   ` Pekka Enberg
2008-12-02 21:28   ` Andrew Morton
2008-12-04 18:50     ` Catalin Marinas
2008-12-04 19:03       ` Andrew Morton
2008-12-18  9:28     ` Catalin Marinas
2008-11-29 10:43 ` [PATCH 02/15] kmemleak: Add documentation on the memory leak detector Catalin Marinas
2008-11-29 10:43 ` [PATCH 03/15] kmemleak: Add the slab memory allocation/freeing hooks Catalin Marinas
2008-11-29 10:43 ` [PATCH 04/15] kmemleak: Add the slob " Catalin Marinas
2008-11-30 17:30   ` Matt Mackall
2008-11-29 10:43 ` [PATCH 05/15] kmemleak: Add the slub " Catalin Marinas
2008-11-29 11:46   ` Pekka Enberg
2008-11-29 10:43 ` [PATCH 06/15] kmemleak: Add the vmalloc " Catalin Marinas
2008-11-29 10:43 ` [PATCH 07/15] kmemleak: Add memleak_alloc callback from alloc_large_system_hash Catalin Marinas
2008-11-29 10:43 ` [PATCH 08/15] kmemleak: Add modules support Catalin Marinas
2008-11-29 10:43 ` [PATCH 09/15] x86: Provide _sdata in the vmlinux_*.lds.S files Catalin Marinas
2008-11-29 10:44 ` [PATCH 10/15] arm: Provide _sdata and __bss_stop in the vmlinux.lds.S file Catalin Marinas
2008-11-29 10:44 ` [PATCH 11/15] kmemleak: Remove some of the kmemleak false positives Catalin Marinas
2008-11-29 11:48   ` Pekka Enberg
2008-11-30 19:02     ` Catalin Marinas
2008-11-29 10:44 ` [PATCH 12/15] kmemleak: Enable the building of the memory leak detector Catalin Marinas
2008-11-29 10:44 ` [PATCH 13/15] kmemleak: Keep the __init functions after initialization Catalin Marinas
2008-11-29 10:44 ` [PATCH 14/15] kmemleak: Simple testing module for kmemleak Catalin Marinas
2008-11-29 10:44 ` [PATCH 15/15] kmemleak: Add the corresponding MAINTAINERS entry Catalin Marinas
2008-12-10 18:26 [PATCH 00/15] Kernel memory leak detector Catalin Marinas
2008-12-10 18:27 ` [PATCH 05/15] kmemleak: Add the slub memory allocation/freeing hooks Catalin Marinas
2008-12-11 21:30   ` Pekka Enberg
2008-12-12 13:45     ` Catalin Marinas
2008-12-18 10:51       ` Pekka Enberg
2008-12-18 15:28         ` Catalin Marinas
2008-12-18 16:05           ` Pekka Enberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.