linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] object debugging infrastructure
@ 2008-03-01 10:24 Thomas Gleixner
  2008-03-01 10:24 ` [patch 1/2] infrastructure to debug (dynamic) objects Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-01 10:24 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Ingo Molnar, Greg KH

We can see an ever repeating problem pattern with objects of any kind in
the kernel:

1) free of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the symptoms surface at a
point where we have no chance to decode the root cause anymore. One
problem spot are kernel timers, where the detection of the problem
often happens in interrupt context and usually causes the machine to
panic.

While working on a timer related bug report I had to hack in
specialized code into the timer subsystem to get a reasonable hint for
the root cause. This debug hack was fine for temporary use, but far
from a mergeable solution due to the intrusiveness into the timer code
and the lack of the ability to detect and report the root cause
instantly and at the same time keeping the system operational so that
we have a decent chance to get hold of the debug information without
special debugging aids like serial consoles.

The problems described above are not restricted to timers, but timers
tend to expose it usually in a full system crash. Other objects are
less explosive, but the symptoms caused by such mistakes can be even
harder to debug.

Instead of creating specialized debugging code for the timer subsystem
a generic infrastructure is created which allows developers to verify
their code and provides an easy to enable debug facility for users in
case of trouble.

The debugobjects core code keeps track of operations on static and
dynamic objects by inserting them into a hashed list and sanity
checking them on object operations and providing additional checks
whenever kernel memory is freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and
the subsystem specific code can provide a fixup function which prevents
the damage of the operation. When the sanity check triggers a warning
message and a stack trace is printed.

The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets. The hash index
is generated from the address of the object to simplify the lookup for
the check on k/vfree. Each bucket has it's own spinlock to avoid
contention on a global lock.

The debug code can be compiled in without being active. The runtime
overhead is minimal and could be optimized by asm alternatives. A
kernel command line option enables the debugging code.

A recent real world debugging session shows the usefulness:

Initial bug report:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
....
Call Trace:
 [<c014627c>] ? tick_nohz_stop_sched_tick+0x13c/0x350
....

The reporter was asked to enable list debugging:

kernel BUG at lib/list_debug.c:33!
...
 [<c0131776>] ? internal_add_timer+0x36/0xb0
 [<c01434ca>] ? clocksource_watchdog+0x22a/0x240
....

Enabling slab poisoning identified the problem as use after free:

BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
....

The above debug informations had to be transscripted from the monitor
due to a full machine crash and at no point it gave any useful hint of
the root cause.

Enabling the object debugging code gave:

Feb 27 07:24:22 kkbox kernel: ODEBUG: free active object: timer_list
Feb 27 07:24:22 kkbox kernel: WARNING: at lib/debugobjects.c:63 debug_print_object+0x64/0x66()
...
Feb 27 07:24:22 kkbox kernel:  [<c01f57e6>] debug_print_object+0x64/0x66
Feb 27 07:24:22 kkbox kernel:  [<c01f5860>] debug_check_no_obj_freed+0x78/0xd5
Feb 27 07:24:22 kkbox kernel:  [<c017dc04>] kfree+0x6c/0xce
Feb 27 07:24:22 kkbox kernel:  [<f9077cbb>] ? l2cap_conn_del+0x5c/0x64 [l2cap]
Feb 27 07:24:22 kkbox kernel:  [<f9077cbb>] l2cap_conn_del+0x5c/0x64 [l2cap]
Feb 27 07:24:22 kkbox kernel:  [<f9077ce5>] l2cap_disconn_ind+0x22/0x27 [l2cap]
Feb 27 07:24:22 kkbox kernel:  [<f8e552fd>] hci_event_packet+0x628/0x19b7 [bluetooth]

The debug information was retrieved from the syslog on a full operational
system, because the fixup code deactivated the timer before returning to
the calling code. The location of the culprit was exactly pointed out by
the stack trace.

Comments, suggestions are welcome as usual.

Thanks,
	tglx

-- 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/2] infrastructure to debug (dynamic) objects
  2008-03-01 10:24 [patch 0/2] object debugging infrastructure Thomas Gleixner
@ 2008-03-01 10:24 ` Thomas Gleixner
  2008-03-01 10:51   ` Andrew Morton
  2008-03-01 10:25 ` [patch 2/2] add timer specific object debugging code Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-01 10:24 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Ingo Molnar, Greg KH

[-- Attachment #1: debug-dynamic-objects.patch --]
[-- Type: text/plain, Size: 15724 bytes --]

We can see an ever repeating problem pattern with objects of any kind in
the kernel:

1) free of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a
point where we have no chance to decode the root cause anymore. One
problem spot are kernel timers, where the detection of the problem
often happens in interrupt context and usually causes the machine to
panic.

While working on a timer related bug report I had to hack in
specialized code into the timer subsystem to get a reasonable hint for
the root cause. This debug hack was fine for temporary use, but far
from a mergeable solution due to the intrusiveness into the timer code
and the lack of the ability to detect and report the root cause
instantly and at the same time keeping the system operational so that
we have a decent chance to get hold of the debug information without
special debugging aids like serial consoles.

The problems described above are not restricted to timers, but timers
tend to expose it usually in a full system crash. Other objects are
less explosive, but the symptoms caused by such mistakes can be even
harder to debug.

Instead of creating specialized debugging code for the timer subsystem
a generic infrastructure is created which allows developers to verify
their code and provides an easy to enable debug facility for users in
case of trouble.

The debugobjects core code keeps track of operations on static and
dynamic objects by inserting them into a hashed list and sanity
checking them on object operations and provides additional checks
whenever kernel memory is freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and
the subsystem specific code can provide a fixup function which prevents
the damage of the operation. When the sanity check triggers a warning
message and a stack trace is printed.

The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets. The hash index
is generated from the address of the object to simplify the lookup for
the check on k/vfree. Each bucket has it's own spinlock to avoid
contention on a global lock.

The debug code can be compiled in without being active. The runtime
overhead is minimal and could be optimized by asm alternatives. A
kernel command line option enables the debugging code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/kernel-parameters.txt |    2 
 include/linux/debugobjects.h        |   47 ++++++
 include/linux/mm.h                  |    1 
 init/main.c                         |    2 
 lib/Kconfig.debug                   |   16 ++
 lib/Makefile                        |    1 
 lib/debugobjects.c                  |  256 ++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c                     |    9 -
 mm/slab.c                           |    2 
 mm/slub.c                           |    1 
 mm/vmalloc.c                        |    1 
 11 files changed, 336 insertions(+), 2 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -554,6 +554,8 @@ and is between 256 and 4096 characters. 
 			1 will print _a lot_ more information - normally
 			only useful to kernel developers.
 
+	debug_objects	[KNL] Enable object debugging
+
 	decnet.addr=	[HW,NET]
 			Format: <area>[,<node>]
 			See also Documentation/networking/decnet.txt.
Index: linux-2.6/include/linux/debugobjects.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/debugobjects.h
@@ -0,0 +1,47 @@
+#ifndef _LINUX_DEBUGOBJECTS_H
+#define _LINUX_DEBUGOBJECTS_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+
+enum debug_obj_op {
+	ODEBUG_INIT,
+	ODEBUG_ADD,
+	ODEBUG_DEL,
+	ODEBUG_FREE,
+	ODEBUG_TEST,
+};
+
+enum {
+	ODEBUG_TYPE_UNKNOWN,
+	ODEBUG_MAX_TYPES
+};
+
+struct debug_obj {
+	struct list_head list;
+	unsigned int type;
+};
+
+#ifdef CONFIG_DEBUG_OBJECT_OPS
+
+extern void debug_object_op(struct debug_obj *obj, enum debug_obj_op op);
+extern void debug_objects_init(void);
+
+#else
+
+static inline void debug_object_op(struct debug_obj *obj, enum debug_obj_op op)
+{
+}
+
+static inline void debug_objects_init(void) { }
+
+#endif
+
+#ifdef CONFIG_DEBUG_OBJECT_FREE
+extern void debug_check_no_obj_freed(const void *address, unsigned long size);
+#else
+static inline void
+debug_check_no_obj_freed(const void *address, unsigned long size) { }
+#endif
+
+#endif
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -11,6 +11,7 @@
 #include <linux/rbtree.h>
 #include <linux/prio_tree.h>
 #include <linux/debug_locks.h>
+#include <linux/debugobjects.h>
 #include <linux/mm_types.h>
 
 struct mempolicy;
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -52,6 +52,7 @@
 #include <linux/unwind.h>
 #include <linux/buffer_head.h>
 #include <linux/debug_locks.h>
+#include <linux/debugobjects.h>
 #include <linux/lockdep.h>
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
@@ -523,6 +524,7 @@ asmlinkage void __init start_kernel(void
 	 */
 	unwind_init();
 	lockdep_init();
+	debug_objects_init();
 	cgroup_init_early();
 
 	local_irq_disable();
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug
+++ linux-2.6/lib/Kconfig.debug
@@ -183,6 +183,22 @@ config TIMER_STATS
 	  (it defaults to deactivated on bootup and will only be activated
 	  if some application like powertop activates it explicitly).
 
+config DEBUG_OBJECT_OPS
+       bool "Debug object operations"
+       depends on DEBUG_KERNEL
+       help
+	 If you say Y here, additional code will be inserted into the
+	 kernel to validate operations on various objects.
+
+config DEBUG_OBJECT_FREE
+       bool "Debug objects in freed memory"
+       depends on DEBUG_OBJECT_OPS
+       help
+         This enables checks whether a k/v free operation frees an area
+	 which contains an object which has not been deactivated
+	 properly.  This can make kmalloc/kfree-intensive workloads
+	 much slower.
+
 config DEBUG_SLAB
 	bool "Debug slab memory allocations"
 	depends on DEBUG_KERNEL && SLAB
Index: linux-2.6/lib/Makefile
===================================================================
--- linux-2.6.orig/lib/Makefile
+++ linux-2.6/lib/Makefile
@@ -36,6 +36,7 @@ obj-$(CONFIG_LOCK_KERNEL) += kernel_lock
 obj-$(CONFIG_PLIST) += plist.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
 obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-$(CONFIG_DEBUG_OBJECT_OPS) += debugobjects.o
 
 ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
   lib-y += dec_and_lock.o
Index: linux-2.6/lib/debugobjects.c
===================================================================
--- /dev/null
+++ linux-2.6/lib/debugobjects.c
@@ -0,0 +1,256 @@
+/*
+ * Generic infrastructure for debugging of objects.
+ *
+ * Started by Thomas Gleixner
+ *
+ * Copyright (C) 2008, Thomas Gleixner <tglx@linutronix.de>
+ *
+ *
+ * For licencing details see kernel-base/COPYING
+ */
+
+#include <linux/debugfs.h>
+#include <linux/debugobjects.h>
+#include <linux/seq_file.h>
+
+#define ODEBUG_HASH_SIZE	4096
+#define ODEBUG_HASH_MASK	(ODEBUG_HASH_SIZE - 1)
+
+struct odebug {
+	struct list_head list;
+	spinlock_t lock;
+};
+
+static struct odebug obj_hash[ODEBUG_HASH_SIZE];
+
+static int debug_objects_selftest __read_mostly;
+static int debug_objects_maxchain __read_mostly;
+static int debug_objects_fixups __read_mostly;
+static int debug_objects_enabled __read_mostly;
+
+static int __init enable_object_debug(char *str)
+{
+	debug_objects_enabled = 1;
+	return 0;
+}
+
+early_param("debug_objects", enable_object_debug);
+
+/*
+ * We use the pfn of the address for the hash. That way we can check
+ * for freed objects simply by checking the affected bucket.
+ */
+static struct odebug *object_get_hash(unsigned long addr)
+{
+	unsigned long hash = (addr >> PAGE_SHIFT) & ODEBUG_HASH_MASK;
+
+	return &obj_hash[hash];
+}
+
+static const void * const debug_fixup[ODEBUG_MAX_TYPES] = {
+};
+
+static const char * const obj_types[ODEBUG_MAX_TYPES] = {
+
+	[ODEBUG_TYPE_UNKNOWN] = "unknown type",
+};
+
+static void debug_print_object(struct debug_obj *obj, char *msg)
+{
+	static int once;
+
+	if (obj->type >= ODEBUG_MAX_TYPES)
+		obj->type = ODEBUG_TYPE_UNKNOWN;
+
+	if (!once && !debug_objects_selftest) {
+		once = 1;
+		printk(KERN_ERR "ODEBUG: %s: %p %s\n", msg, obj,
+		       obj_types[obj->type]);
+		WARN_ON(1);
+	}
+}
+
+static int debug_object_fixup(struct debug_obj *obj,  enum debug_obj_op op)
+{
+	int (*fixup_fn)(struct debug_obj *obj, enum debug_obj_op op);
+
+	/*
+	 * Try to repair the damage, so we have a better chance to get
+	 * useful debug output.
+	 */
+	fixup_fn = debug_fixup[obj->type];
+	if (!fixup_fn)
+		return -1;
+
+	return fixup_fn(obj, op);
+}
+
+void __debug_object_op(struct debug_obj *obj, enum debug_obj_op op)
+{
+	struct odebug *od = object_get_hash((unsigned long) obj);
+	struct debug_obj *dobj;
+	unsigned long flags;
+	int fixup = 0, cnt = 0;
+
+	spin_lock_irqsave(&od->lock, flags);
+
+	list_for_each_entry(dobj, &od->list, list) {
+		cnt++;
+		if (dobj == obj)
+			break;
+	}
+
+	if (cnt > debug_objects_maxchain)
+		debug_objects_maxchain = cnt;
+
+	/* New object ? */
+	if (dobj != obj ) {
+		switch (op) {
+		case ODEBUG_ADD:
+			list_add(&obj->list, &od->list);
+			break;
+		case ODEBUG_DEL:
+			debug_print_object(obj, "delete inactive object");
+			fixup = 1;
+			break;
+		default:
+			break;
+		}
+	} else {
+		switch (op) {
+		case ODEBUG_INIT:
+			debug_print_object(obj, "init active object");
+			fixup = 1;
+			break;
+		case ODEBUG_ADD:
+			debug_print_object(obj, "add active object");
+			fixup = 1;
+			break;
+		case ODEBUG_DEL:
+			list_del(&obj->list);
+			break;
+		default:
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&od->lock, flags);
+
+	if (!fixup)
+		return;
+
+	/* try to fixup the wreckage */
+	if (!debug_object_fixup(obj, op)) {
+		debug_objects_fixups++;
+		spin_lock_irqsave(&od->lock, flags);
+		switch (op) {
+		case ODEBUG_ADD:
+			list_add(&obj->list, &od->list);
+			break;
+		default:
+			break;
+		}
+		spin_unlock_irqrestore(&od->lock, flags);
+	}
+}
+
+/**
+ * debug_object_op - check object operation
+ * @obj:	object to check
+ * @op:		operation
+ */
+void debug_object_op(struct debug_obj *obj, enum debug_obj_op op)
+{
+	if (debug_objects_enabled)
+		__debug_object_op(obj, op);
+}
+
+#ifdef CONFIG_DEBUG_OBJECT_FREE
+static void __debug_check_no_obj_freed(const void *address, unsigned long size)
+{
+	unsigned long flags, oaddr, saddr, eaddr, paddr;
+	struct debug_obj *dobj, *tmp;
+	struct odebug *od;
+	unsigned int cnt = 0, pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	saddr = (unsigned long) address;
+	eaddr = saddr + size;
+	paddr = saddr & PAGE_MASK;
+
+	for (;pages > 0; pages--, paddr += PAGE_SIZE) {
+		od = object_get_hash(paddr);
+
+		spin_lock_irqsave(&od->lock, flags);
+		list_for_each_entry_safe(dobj, tmp, &od->list, list) {
+			cnt++;
+			oaddr = (unsigned long) dobj;
+			if (oaddr < saddr || oaddr >= eaddr)
+				continue;
+			debug_print_object(dobj, "active object freed");
+			spin_unlock_irqrestore(&od->lock, flags);
+			debug_object_fixup(dobj, ODEBUG_FREE);
+			spin_lock_irqsave(&od->lock, flags);
+		}
+		spin_unlock_irqrestore(&od->lock, flags);
+		if (cnt > debug_objects_maxchain)
+			debug_objects_maxchain = cnt;
+	}
+}
+
+void debug_check_no_obj_freed(const void *address, unsigned long size)
+{
+	if (debug_objects_enabled)
+		__debug_check_no_obj_freed(address, size);
+}
+#endif
+
+#ifdef CONFIG_DEBUG_FS
+
+static int debug_objects_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "max_chain     :%d\n", debug_objects_maxchain);
+	seq_printf(m, "fixups        :%d\n", debug_objects_fixups);
+	return 0;
+}
+
+static int debug_objects_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, debug_objects_show, NULL);
+}
+
+static const struct file_operations debug_objects_fops = {
+	.open		= debug_objects_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static void __init debug_objects_init_debugfs(void)
+{
+	debugfs_create_file("debug_objects_info", 0600, NULL, NULL,
+			    &debug_objects_fops);
+}
+
+#else
+static inline void debug_objects_init_debugfs(void) { }
+#endif
+
+void __init debug_objects_init(void)
+{
+	int i;
+
+	for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
+		INIT_LIST_HEAD(&obj_hash[i].list);
+		spin_lock_init(&obj_hash[i].lock);
+	}
+}
+
+int __init debug_objects_do_selftest(void)
+{
+	if (!debug_objects_enabled)
+		return 0;
+
+	debug_objects_init_debugfs();
+	printk(KERN_INFO "ODEBUG: Selftest pass\n");
+	return 0;
+}
+__initcall(debug_objects_do_selftest);
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -524,8 +524,11 @@ static void __free_pages_ok(struct page 
 	if (reserved)
 		return;
 
-	if (!PageHighMem(page))
+	if (!PageHighMem(page)) {
 		debug_check_no_locks_freed(page_address(page),PAGE_SIZE<<order);
+		debug_check_no_obj_freed(page_address(page),
+					   PAGE_SIZE << order);
+	}
 	arch_free_page(page, order);
 	kernel_map_pages(page, 1 << order, 0);
 
@@ -986,8 +989,10 @@ static void free_hot_cold_page(struct pa
 	if (free_pages_check(page))
 		return;
 
-	if (!PageHighMem(page))
+	if (!PageHighMem(page)) {
 		debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
+		debug_check_no_obj_freed(page_address(page), PAGE_SIZE);
+	}
 	VM_BUG_ON(page_get_page_cgroup(page));
 	arch_free_page(page, 0);
 	kernel_map_pages(page, 1, 0);
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -3766,6 +3766,7 @@ void kmem_cache_free(struct kmem_cache *
 
 	local_irq_save(flags);
 	debug_check_no_locks_freed(objp, obj_size(cachep));
+	debug_check_no_obj_freed(objp, obj_size(cachep));
 	__cache_free(cachep, objp);
 	local_irq_restore(flags);
 }
@@ -3791,6 +3792,7 @@ void kfree(const void *objp)
 	kfree_debugcheck(objp);
 	c = virt_to_cache(objp);
 	debug_check_no_locks_freed(objp, obj_size(c));
+	debug_check_no_obj_freed(objp, obj_size(c));
 	__cache_free(c, (void *)objp);
 	local_irq_restore(flags);
 }
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c
+++ linux-2.6/mm/slub.c
@@ -1725,6 +1725,7 @@ static __always_inline void slab_free(st
 
 	local_irq_save(flags);
 	debug_check_no_locks_freed(object, s->objsize);
+	debug_check_no_obj_freed(object, s->objsize);
 	c = get_cpu_slab(s, smp_processor_id());
 	if (likely(page == c->page && c->node >= 0)) {
 		object[c->offset] = c->freelist;
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -383,6 +383,7 @@ static void __vunmap(const void *addr, i
 	}
 
 	debug_check_no_locks_freed(addr, area->size);
+	debug_check_no_obj_freed(addr, area->size);
 
 	if (deallocate_pages) {
 		int i;

-- 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 2/2] add timer specific object debugging code
  2008-03-01 10:24 [patch 0/2] object debugging infrastructure Thomas Gleixner
  2008-03-01 10:24 ` [patch 1/2] infrastructure to debug (dynamic) objects Thomas Gleixner
@ 2008-03-01 10:25 ` Thomas Gleixner
  2008-03-02  5:20 ` [patch 0/2] object debugging infrastructure Greg KH
  2008-03-03 12:42 ` Andi Kleen
  3 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-01 10:25 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Ingo Molnar, Greg KH

[-- Attachment #1: timer-debugging.patch --]
[-- Type: text/plain, Size: 7033 bytes --]

Add calls to the generic object debugging infrastructure and provide a
fixup function which allows to keep the system alive when recoverable
problems have been detected by the object debugging core code. Add a
selftest, which covers the two possible mistakes: free/init of an
active timer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/debugobjects.h |    1 
 include/linux/timer.h        |   22 +++++++++++
 kernel/timer.c               |   85 +++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug            |    7 +++
 lib/debugobjects.c           |   18 ++++++++-
 5 files changed, 131 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/debugobjects.h
===================================================================
--- linux-2.6.orig/include/linux/debugobjects.h
+++ linux-2.6/include/linux/debugobjects.h
@@ -14,6 +14,7 @@ enum debug_obj_op {
 
 enum {
 	ODEBUG_TYPE_UNKNOWN,
+	ODEBUG_TYPE_TIMER,
 	ODEBUG_MAX_TYPES
 };
 
Index: linux-2.6/include/linux/timer.h
===================================================================
--- linux-2.6.orig/include/linux/timer.h
+++ linux-2.6/include/linux/timer.h
@@ -4,6 +4,7 @@
 #include <linux/list.h>
 #include <linux/ktime.h>
 #include <linux/stddef.h>
+#include <linux/debugobjects.h>
 
 struct tvec_base;
 
@@ -20,6 +21,9 @@ struct timer_list {
 	char start_comm[16];
 	int start_pid;
 #endif
+#ifdef CONFIG_DEBUG_OBJECT_TIMERS
+	struct debug_obj dobj;
+#endif
 };
 
 extern struct tvec_base boot_tvec_bases;
@@ -164,5 +168,23 @@ unsigned long __round_jiffies_relative(u
 unsigned long round_jiffies(unsigned long j);
 unsigned long round_jiffies_relative(unsigned long j);
 
+#ifdef CONFIG_DEBUG_OBJECT_TIMERS
+static inline void
+debug_timer_object_op(struct timer_list *timer, enum debug_obj_op mode)
+{
+	timer->dobj.type = ODEBUG_TYPE_TIMER;
+	debug_object_op(&timer->dobj, mode);
+}
+extern int timer_fixup_object(struct debug_obj *obj, enum debug_obj_op mode);
+extern int timer_debug_object_selftest(void);
+#else
+static inline void
+debug_timer_object_op(struct timer_list *timer, enum debug_obj_op mode)
+{
+}
+# define timer_fixup_object NULL
+static inline int timer_debug_object_selftest(void) { return 0; }
+#endif
+
 
 #endif
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -320,6 +320,84 @@ static void timer_stats_account_timer(st
 static void timer_stats_account_timer(struct timer_list *timer) {}
 #endif
 
+#ifdef CONFIG_DEBUG_OBJECT_TIMERS
+
+static int timer_fixup_done __read_mostly;
+
+int timer_fixup_object(struct debug_obj *obj, enum debug_obj_op op)
+{
+	struct timer_list *timer = container_of(obj, struct timer_list, dobj);
+
+	switch (op) {
+	case ODEBUG_INIT:
+	case ODEBUG_FREE:
+		del_timer_sync(timer);
+		timer_fixup_done++;
+		return 0;
+	/*
+	 * These are fatal timer.c internal errors. No real way to
+	 * survive:
+	 */
+	case ODEBUG_ADD:
+	case ODEBUG_DEL:
+		return -1;
+	default:
+		return -1;
+	}
+}
+
+/*
+ * Test the two popular bugs:
+ *
+ * - reinit a timer which is enqueued
+ * - free a datastructure which contains an enqueued timer
+ */
+
+static void __init timer_debug_selftest_fn(unsigned long arg)
+{
+}
+
+int __init timer_debug_object_selftest(void)
+{
+	struct timer_list *timer;
+	int fixup_cnt = timer_fixup_done;
+
+	timer = kmalloc(sizeof(struct timer_list), GFP_KERNEL);
+	if (!timer)
+		return -ENOMEM;
+
+	setup_timer(timer, timer_debug_selftest_fn, 0);
+
+	timer->expires = jiffies + HZ;
+	add_timer(timer);
+
+	setup_timer(timer, timer_debug_selftest_fn, 0);
+	if (fixup_cnt == timer_fixup_done)
+		goto err;
+
+#ifdef CONFIG_DEBUG_OBJECTS_FREE
+	timer->expires = jiffies + HZ;
+	add_timer(timer);
+
+	fixup_cnt = timer_fixup_done;
+	kfree(timer);
+	timer = NULL;
+
+	if (fixup_cnt == timer_fixup_done)
+		goto err;
+#else
+	kfree(timer);
+#endif
+	return 0;
+
+err:
+	printk(KERN_ERR "TIMER: ODEBUG selftest failed\n");
+	kfree(timer);
+	return -1;
+}
+
+#endif
+
 /**
  * init_timer - initialize a timer.
  * @timer: the timer to be initialized
@@ -329,6 +407,8 @@ static void timer_stats_account_timer(st
  */
 void init_timer(struct timer_list *timer)
 {
+	debug_timer_object_op(timer, ODEBUG_INIT);
+
 	timer->entry.next = NULL;
 	timer->base = __raw_get_cpu_var(tvec_bases);
 #ifdef CONFIG_TIMER_STATS
@@ -351,6 +431,8 @@ static inline void detach_timer(struct t
 {
 	struct list_head *entry = &timer->entry;
 
+	debug_timer_object_op(timer, ODEBUG_DEL);
+
 	__list_del(entry->prev, entry->next);
 	if (clear_pending)
 		entry->next = NULL;
@@ -405,6 +487,8 @@ int __mod_timer(struct timer_list *timer
 		ret = 1;
 	}
 
+	debug_timer_object_op(timer, ODEBUG_ADD);
+
 	new_base = __get_cpu_var(tvec_bases);
 
 	if (base != new_base) {
@@ -450,6 +534,7 @@ void add_timer_on(struct timer_list *tim
 	BUG_ON(timer_pending(timer) || !timer->function);
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
+	debug_timer_object_op(timer, ODEBUG_ADD);
 	internal_add_timer(base, timer);
 	spin_unlock_irqrestore(&base->lock, flags);
 }
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug
+++ linux-2.6/lib/Kconfig.debug
@@ -199,6 +199,13 @@ config DEBUG_OBJECT_FREE
 	 properly.  This can make kmalloc/kfree-intensive workloads
 	 much slower.
 
+config DEBUG_OBJECT_TIMERS
+	bool "Debug timer objects"
+       	depends on DEBUG_OBJECT_OPS
+       	help
+	  If you say Y here, additional code will be inserted into the
+	  timer routines to validate the timer operations.
+
 config DEBUG_SLAB
 	bool "Debug slab memory allocations"
 	depends on DEBUG_KERNEL && SLAB
Index: linux-2.6/lib/debugobjects.c
===================================================================
--- linux-2.6.orig/lib/debugobjects.c
+++ linux-2.6/lib/debugobjects.c
@@ -12,6 +12,7 @@
 #include <linux/debugfs.h>
 #include <linux/debugobjects.h>
 #include <linux/seq_file.h>
+#include <linux/timer.h>
 
 #define ODEBUG_HASH_SIZE	4096
 #define ODEBUG_HASH_MASK	(ODEBUG_HASH_SIZE - 1)
@@ -48,11 +49,13 @@ static struct odebug *object_get_hash(un
 }
 
 static const void * const debug_fixup[ODEBUG_MAX_TYPES] = {
+	[ODEBUG_TYPE_TIMER] = timer_fixup_object,
 };
 
 static const char * const obj_types[ODEBUG_MAX_TYPES] = {
 
 	[ODEBUG_TYPE_UNKNOWN] = "unknown type",
+	[ODEBUG_TYPE_TIMER] = "timer_list",
 };
 
 static void debug_print_object(struct debug_obj *obj, char *msg)
@@ -246,11 +249,22 @@ void __init debug_objects_init(void)
 
 int __init debug_objects_do_selftest(void)
 {
+	int res;
+
 	if (!debug_objects_enabled)
 		return 0;
 
 	debug_objects_init_debugfs();
-	printk(KERN_INFO "ODEBUG: Selftest pass\n");
-	return 0;
+
+	debug_objects_selftest = 1;
+
+	res = timer_debug_object_selftest();
+
+	debug_objects_selftest = 0;
+
+	if (!res)
+		printk(KERN_INFO "ODEBUG: Selftest pass\n");
+
+	return res;
 }
 __initcall(debug_objects_do_selftest);

-- 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] infrastructure to debug (dynamic) objects
  2008-03-01 10:24 ` [patch 1/2] infrastructure to debug (dynamic) objects Thomas Gleixner
@ 2008-03-01 10:51   ` Andrew Morton
  2008-03-01 11:44     ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2008-03-01 10:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, Greg KH

On Sat, 01 Mar 2008 10:24:56 -0000 Thomas Gleixner <tglx@linutronix.de> wrote:

> We can see an ever repeating problem pattern with objects of any kind in
> the kernel:
> 
> 1) free of active objects
> 2) reinitialization of active objects
> 
> Both problems can be hard to debug because the crash happens at a
> point where we have no chance to decode the root cause anymore. One
> problem spot are kernel timers, where the detection of the problem
> often happens in interrupt context and usually causes the machine to
> panic.
> 
> While working on a timer related bug report I had to hack in
> specialized code into the timer subsystem to get a reasonable hint for
> the root cause. This debug hack was fine for temporary use, but far
> from a mergeable solution due to the intrusiveness into the timer code
> and the lack of the ability to detect and report the root cause
> instantly and at the same time keeping the system operational so that
> we have a decent chance to get hold of the debug information without
> special debugging aids like serial consoles.
> 
> The problems described above are not restricted to timers, but timers
> tend to expose it usually in a full system crash. Other objects are
> less explosive, but the symptoms caused by such mistakes can be even
> harder to debug.
> 
> Instead of creating specialized debugging code for the timer subsystem
> a generic infrastructure is created which allows developers to verify
> their code and provides an easy to enable debug facility for users in
> case of trouble.
> 
> The debugobjects core code keeps track of operations on static and
> dynamic objects by inserting them into a hashed list and sanity
> checking them on object operations and provides additional checks
> whenever kernel memory is freed.
> 
> The tracked object operations are:
> - initializing an object
> - adding an object to a subsystem list
> - deleting an object from a subsystem list
> 
> Each operation is sanity checked before the operation is executed and
> the subsystem specific code can provide a fixup function which prevents
> the damage of the operation. When the sanity check triggers a warning
> message and a stack trace is printed.
> 
> The list of operations can be extended if the need arises. For now it's
> limited to the requirements of the first user (timers).
> 
> The core code enqueues the objects into hash buckets. The hash index
> is generated from the address of the object to simplify the lookup for
> the check on k/vfree. Each bucket has it's own spinlock to avoid
> contention on a global lock.
> 
> The debug code can be compiled in without being active. The runtime
> overhead is minimal and could be optimized by asm alternatives. A
> kernel command line option enables the debugging code.
> 

hm.

The enabled-via-command-line thing is a bit sad.  I guess runtime
switchability would be hard.

The proof of this pudding is "how many subsystems use it".  If things like
the fault-injection framework are a guide, the answer is "zero unless
Thomas does it".  Call me an experienced cynic.

> +enum debug_obj_op {
> +	ODEBUG_INIT,
> +	ODEBUG_ADD,
> +	ODEBUG_DEL,
> +	ODEBUG_FREE,
> +	ODEBUG_TEST,
> +};

Interface nit: five different API calls would be preferable to one call
with a mode argument.

> +enum {
> +	ODEBUG_TYPE_UNKNOWN,
> +	ODEBUG_MAX_TYPES
> +};

The need for each client to patch this table is regrettable.
	
> --- linux-2.6.orig/include/linux/mm.h
> +++ linux-2.6/include/linux/mm.h
> @@ -11,6 +11,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/prio_tree.h>
>  #include <linux/debug_locks.h>
> +#include <linux/debugobjects.h>
>  #include <linux/mm_types.h>
>  
>  struct mempolicy;

I suspect this wasn't supposed to be here.

> Index: linux-2.6/lib/debugobjects.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/lib/debugobjects.c
> @@ -0,0 +1,256 @@
> +/*
> + * Generic infrastructure for debugging of objects.
> + *
> + * Started by Thomas Gleixner
> + *
> + * Copyright (C) 2008, Thomas Gleixner <tglx@linutronix.de>
> + *
> + *
> + * For licencing details see kernel-base/COPYING
> + */
> +
> +#include <linux/debugfs.h>
> +#include <linux/debugobjects.h>
> +#include <linux/seq_file.h>
> +
> +#define ODEBUG_HASH_SIZE	4096

power-of-2 is said to be a very poor size for a hash table.

> +#define ODEBUG_HASH_MASK	(ODEBUG_HASH_SIZE - 1)
> +
> +struct odebug {
> +	struct list_head list;

hlist?

> +	spinlock_t lock;
> +};
> +
> +static struct odebug obj_hash[ODEBUG_HASH_SIZE];
> +
> +static int debug_objects_selftest __read_mostly;
> +static int debug_objects_maxchain __read_mostly;
> +static int debug_objects_fixups __read_mostly;
> +static int debug_objects_enabled __read_mostly;
> +
> +static int __init enable_object_debug(char *str)
> +{
> +	debug_objects_enabled = 1;
> +	return 0;
> +}
> +
> +early_param("debug_objects", enable_object_debug);

I can never remember whether these things need to return 0 or 1 on success.

<looks at debug_kernel(), quiet_kernel() and loglevel().  Shudders.
Moves on.>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] infrastructure to debug (dynamic) objects
  2008-03-01 10:51   ` Andrew Morton
@ 2008-03-01 11:44     ` Thomas Gleixner
  2008-03-01 22:42       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-01 11:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Ingo Molnar, Greg KH

On Sat, 1 Mar 2008, Andrew Morton wrote:
> On Sat, 01 Mar 2008 10:24:56 -0000 Thomas Gleixner <tglx@linutronix.de> wrote:
> > The debug code can be compiled in without being active. The runtime
> > overhead is minimal and could be optimized by asm alternatives. A
> > kernel command line option enables the debugging code.
> > 
> 
> hm.
> 
> The enabled-via-command-line thing is a bit sad.  I guess runtime
> switchability would be hard.

Yes, keeping track of such stuff from some random state is not really
a good idea.
 
> The proof of this pudding is "how many subsystems use it".  If things like
> the fault-injection framework are a guide, the answer is "zero unless
> Thomas does it".  Call me an experienced cynic.

Hey, I'm as much a cynic as you. It just bothered me to hack this
solely into the timer code with the knowledge that there are other
potential users lurking. 

And I want to have it in the timer code for a simple reason: Everytime
when the timer list code explodes due to some stupid timer_list user,
I'm the moron who has to apply hackery to help the users to identify
the root cause. This happened more than 5 times in the last 3 month
and I'm fed up with keeping my ugly debug hackery up to date and
instruct bug reporters how to use it. The delta to get down to the
cause of the problem in the example case is: 

      one mail versus five.
      zero patch vs. finding and updating the old version

I have better ways to waste my sparse leisure time :)

You can be sure, that I will add the specific debug code to other
parts of the kernel whenever I trap over a problem which smells like
the kfree'd or reinitialized timer wreckage.
 
> > +enum debug_obj_op {
> > +	ODEBUG_INIT,
> > +	ODEBUG_ADD,
> > +	ODEBUG_DEL,
> > +	ODEBUG_FREE,
> > +	ODEBUG_TEST,
> > +};
> 
> Interface nit: five different API calls would be preferable to one call
> with a mode argument.

Will do.

> > +enum {
> > +	ODEBUG_TYPE_UNKNOWN,
> > +	ODEBUG_MAX_TYPES
> > +};
> 
> The need for each client to patch this table is regrettable.

I did not bother with a dynamic register interface before getting some
feedback on the idea in general. Will do when there is some agreement
that this is a useful infrastructure in general.

> > --- linux-2.6.orig/include/linux/mm.h
> > +++ linux-2.6/include/linux/mm.h
> > @@ -11,6 +11,7 @@
> >  #include <linux/rbtree.h>
> >  #include <linux/prio_tree.h>
> >  #include <linux/debug_locks.h>
> > +#include <linux/debugobjects.h>
> >  #include <linux/mm_types.h>
> >  
> >  struct mempolicy;
> 
> I suspect this wasn't supposed to be here.

Well, I'm lazy as hell and I found the debug_locks.h include already,
which was added for the same reason: not adding the include to all the
affected files in mm/

Btw. DEBUG_LOCK_ALLOC is a perfect candidate to move over to this as
the current code only detects locks held by the calling thread, while
debugobjects detects also a lock held by any owner.

> > +
> > +#define ODEBUG_HASH_SIZE	4096
> 
> power-of-2 is said to be a very poor size for a hash table.

The hash is not a randomized hash as one would expect. It's purely
generated from the object address to simplify the lookup during the
free check. So the power of 2 size is a good thing :) /me adds
comment.

> > +#define ODEBUG_HASH_MASK	(ODEBUG_HASH_SIZE - 1)
> > +
> > +struct odebug {
> > +	struct list_head list;
> 
> hlist?

Good point.
 
> > +	spinlock_t lock;
> > +};
> > +
> > +static struct odebug obj_hash[ODEBUG_HASH_SIZE];
> > +
> > +static int debug_objects_selftest __read_mostly;
> > +static int debug_objects_maxchain __read_mostly;
> > +static int debug_objects_fixups __read_mostly;
> > +static int debug_objects_enabled __read_mostly;
> > +
> > +static int __init enable_object_debug(char *str)
> > +{
> > +	debug_objects_enabled = 1;
> > +	return 0;
> > +}
> > +
> > +early_param("debug_objects", enable_object_debug);
> 
> I can never remember whether these things need to return 0 or 1 on success.

Same here. As usual I copied it from some other place assuming that it
was correct there. Will check.
 
> <looks at debug_kernel(), quiet_kernel() and loglevel().  Shudders.
> Moves on.>

Hehehe.

Thanks for the review,

       tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] infrastructure to debug (dynamic) objects
  2008-03-01 11:44     ` Thomas Gleixner
@ 2008-03-01 22:42       ` Peter Zijlstra
  2008-03-02 10:06         ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2008-03-01 22:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Andrew Morton, LKML, Ingo Molnar, Greg KH


On Sat, 2008-03-01 at 12:44 +0100, Thomas Gleixner wrote:
> On Sat, 1 Mar 2008, Andrew Morton wrote:

> > > +
> > > +#define ODEBUG_HASH_SIZE	4096
> > 
> > power-of-2 is said to be a very poor size for a hash table.
> 
> The hash is not a randomized hash as one would expect. It's purely
> generated from the object address to simplify the lookup during the
> free check. So the power of 2 size is a good thing :) /me adds
> comment.

Power of two buckets work when used along with a golden ratio based hash
map - as done by linux/hash.h.

Power of two hash maps otoh suck royally.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/2] object debugging infrastructure
  2008-03-01 10:24 [patch 0/2] object debugging infrastructure Thomas Gleixner
  2008-03-01 10:24 ` [patch 1/2] infrastructure to debug (dynamic) objects Thomas Gleixner
  2008-03-01 10:25 ` [patch 2/2] add timer specific object debugging code Thomas Gleixner
@ 2008-03-02  5:20 ` Greg KH
  2008-03-02  9:54   ` Thomas Gleixner
  2008-03-03 12:42 ` Andi Kleen
  3 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2008-03-02  5:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Andrew Morton, Ingo Molnar

On Sat, Mar 01, 2008 at 10:24:52AM -0000, Thomas Gleixner wrote:
> We can see an ever repeating problem pattern with objects of any kind in
> the kernel:
> 
> 1) free of active objects
> 2) reinitialization of active objects

Ah, this looks nice.  For kobjects I would like to track the above, as
well as:
	- use of initialized objects
	- use of "freed" objects
	- objects that are never destroyed, yet the code controlling
	  them thinks they are.

I say "freed" as sometimes kobjects are in static structures and are not
in memory that ends up being kfree() so slab poisoning doesn't help.

Do you think that would be able to worked into this framework?  At first
glance, it seems like it would be easy to add, but would like to make
sure.

If so, I'll gladly add this to the kobjects to help with issues there.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/2] object debugging infrastructure
  2008-03-02  5:20 ` [patch 0/2] object debugging infrastructure Greg KH
@ 2008-03-02  9:54   ` Thomas Gleixner
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-02  9:54 UTC (permalink / raw)
  To: Greg KH; +Cc: LKML, Andrew Morton, Ingo Molnar

On Sat, 1 Mar 2008, Greg KH wrote:
> On Sat, Mar 01, 2008 at 10:24:52AM -0000, Thomas Gleixner wrote:
> > We can see an ever repeating problem pattern with objects of any kind in
> > the kernel:
> > 
> > 1) free of active objects
> > 2) reinitialization of active objects
> 
> Ah, this looks nice.  For kobjects I would like to track the above, as
> well as:
> 	- use of initialized objects
> 	- use of "freed" objects
> 	- objects that are never destroyed, yet the code controlling
> 	  them thinks they are.
> 
> I say "freed" as sometimes kobjects are in static structures and are not
> in memory that ends up being kfree() so slab poisoning doesn't help.

Good point. I try to come up with the infrastructure for that.

> Do you think that would be able to worked into this framework?  At first
> glance, it seems like it would be easy to add, but would like to make
> sure.

Yes, that should be possible

> If so, I'll gladly add this to the kobjects to help with issues there.

kobjects came to my mind as well, when I was thinking about possible
use cases.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] infrastructure to debug (dynamic) objects
  2008-03-01 22:42       ` Peter Zijlstra
@ 2008-03-02 10:06         ` Thomas Gleixner
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2008-03-02 10:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, LKML, Ingo Molnar, Greg KH

On Sat, 1 Mar 2008, Peter Zijlstra wrote:
> On Sat, 2008-03-01 at 12:44 +0100, Thomas Gleixner wrote:
> > On Sat, 1 Mar 2008, Andrew Morton wrote:
> 
> > > > +
> > > > +#define ODEBUG_HASH_SIZE	4096
> > > 
> > > power-of-2 is said to be a very poor size for a hash table.
> > 
> > The hash is not a randomized hash as one would expect. It's purely
> > generated from the object address to simplify the lookup during the
> > free check. So the power of 2 size is a good thing :) /me adds
> > comment.
> 
> Power of two buckets work when used along with a golden ratio based hash
> map - as done by linux/hash.h.
> 
> Power of two hash maps otoh suck royally.

Well, using a real hash algorithm sucks royally if you need to do the
reverse check on *free(). With the direct object address to hash
bucket mapping we have usually only one bucket to look at, otherwise
we end up looking at multiple buckets and doing the list walk on each
of them. We have to decide which suckage is worse :)

Thanks,
	tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/2] object debugging infrastructure
  2008-03-01 10:24 [patch 0/2] object debugging infrastructure Thomas Gleixner
                   ` (2 preceding siblings ...)
  2008-03-02  5:20 ` [patch 0/2] object debugging infrastructure Greg KH
@ 2008-03-03 12:42 ` Andi Kleen
  3 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2008-03-03 12:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Andrew Morton, Ingo Molnar, Greg KH, chris.mason

Thomas Gleixner <tglx@linutronix.de> writes:
>
> The debugobjects core code keeps track of operations on static and
> dynamic objects by inserting them into a hashed list and sanity
> checking them on object operations and providing additional checks
> whenever kernel memory is freed.

Nice idea to make this generic.

If you push this it would be also good to consider some variant
of the "crasher" code that is in SUSE Kernels to include with it. 

It was originally written by Chris Mason and it also makes it easier
to stabilize such bugs. The basic idea is that a background thread
allocates lots of slabs and pages always poisons them and checks them
in the background. If someone corrupts memory that is noticed earlier
then

Basically it extends what your patchkit is doing to free memory too.

Chris' old patch:
http://firstfloor.org/~andi/crasher-26.diff
(various variants of that have existed over time) 


-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-03-03 12:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-01 10:24 [patch 0/2] object debugging infrastructure Thomas Gleixner
2008-03-01 10:24 ` [patch 1/2] infrastructure to debug (dynamic) objects Thomas Gleixner
2008-03-01 10:51   ` Andrew Morton
2008-03-01 11:44     ` Thomas Gleixner
2008-03-01 22:42       ` Peter Zijlstra
2008-03-02 10:06         ` Thomas Gleixner
2008-03-01 10:25 ` [patch 2/2] add timer specific object debugging code Thomas Gleixner
2008-03-02  5:20 ` [patch 0/2] object debugging infrastructure Greg KH
2008-03-02  9:54   ` Thomas Gleixner
2008-03-03 12:42 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).