linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] debugobject: add support for kref
@ 2013-10-24 10:09 Sebastian Andrzej Siewior
  2013-10-24 14:48 ` Thomas Gleixner
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-10-24 10:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, jic23, lars, gregkh, Sebastian Andrzej Siewior

I run a couple times into the case where "put" was caled on an already
cleanup object. The results was either nothing because "0" went back to
0xff…ff and release was not called a second time or some thing like this
with SLAB:

|edma-dma-engine edma-dma-engine.0: freeing channel for 57
|Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256
|070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
|Single bit error detected. Probably bad RAM.
|Run a memory test tool.
|Prev obj: start=edc4c7c0, len=256
|000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|Next obj: start=edc4c9c0, len=256
|000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

which got me a little confused about the bit flip at first.
The reason for this was ressource counting that went wrong and a cleanup from
two places.
After the second time running into the same problem using on the same driver,
I was looking for something to help me to find the caller of it a little
quicker. Here is a debugobject extension to kref which seems to do the job.
At kref_init() the debugobject is initialized. There is no active state atm.
At kref_sub() time I check if the kref is still tracked by debugobject and if
it is everything continues as usual. Since we do not have any "static" objects
we do not have any no unknown objects. So each "unknown" object is one that is
accessed after its cleanup and this will trigger now:

|edma-dma-engine edma-dma-engine.0: freeing channel for 57
|------------[ cut here ]------------
|WARNING: CPU: 0 PID: 2095 at include/linux/kref.h:83 iio_buffer_put+0x84/0xa4 [industrialio]()
|kref_put: unknown reference ec925134, cleanup iio_buffer_release+0x0/0x24 [industrialio]
|Modules linked in: ti_am335x_adc(-)
|CPU: 0 PID: 2095 Comm: rmmod Not tainted 3.12.0-rc6+ #414
|[<c0014d38>] (unwind_backtrace+0x0/0xf4) from [<c001249c>] (show_stack+0x14/0x1c)
|[<c001249c>] (show_stack+0x14/0x1c) from [<c0037244>] (warn_slowpath_common+0x64/0x84)
|[<c0037244>] (warn_slowpath_common+0x64/0x84) from [<c00372f8>] (warn_slowpath_fmt+0x30/0x40)
|[<c00372f8>] (warn_slowpath_fmt+0x30/0x40) from [<bf0d74b4>] (iio_buffer_put+0x84/0xa4 [industrialio])
|[<bf0d74b4>] (iio_buffer_put+0x84/0xa4 [industrialio]) from [<bf0d4340>] (iio_dev_release+0x44/0x64 [industrialio])
|[<bf0d4340>] (iio_dev_release+0x44/0x64 [industrialio]) from [<c028ffb0>] (device_release+0x2c/0x94)
|[<c028ffb0>] (device_release+0x2c/0x94) from [<c021ee60>] (kobject_release+0x44/0x78)
|[<c021ee60>] (kobject_release+0x44/0x78) from [<c0295cb4>] (release_nodes+0x1a0/0x248)
|[<c0295cb4>] (release_nodes+0x1a0/0x248) from [<c0293204>] (__device_release_driver+0x98/0xf0)
|[<c0293204>] (__device_release_driver+0x98/0xf0) from [<c0293310>] (driver_detach+0xb4/0xb8)
|[<c0293310>] (driver_detach+0xb4/0xb8) from [<c02921e0>] (bus_remove_driver+0x98/0xec)
|[<c02921e0>] (bus_remove_driver+0x98/0xec) from [<c008fe1c>] (SyS_delete_module+0x1e0/0x24c)
|[<c008fe1c>] (SyS_delete_module+0x1e0/0x24c) from [<c000e680>] (ret_fast_syscall+0x0/0x48)
|---[ end trace 34ae5f26b0a9d556 ]---

which ease the search for double cleanup.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/debugobjects.h |  4 ++++
 include/linux/kref.h         | 10 ++++++++++
 lib/Kconfig.debug            |  8 ++++++++
 lib/debugobjects.c           | 42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/include/linux/debugobjects.h b/include/linux/debugobjects.h
index 98ffcbd..a71bb24 100644
--- a/include/linux/debugobjects.h
+++ b/include/linux/debugobjects.h
@@ -68,6 +68,8 @@ extern void debug_object_deactivate(void *addr, struct debug_obj_descr *descr);
 extern void debug_object_destroy   (void *addr, struct debug_obj_descr *descr);
 extern void debug_object_free      (void *addr, struct debug_obj_descr *descr);
 extern void debug_object_assert_init(void *addr, struct debug_obj_descr *descr);
+extern bool debug_object_is_tracked(void *addr);
+extern void debugobj_kref_splat(void *addr, void *func);
 
 /*
  * Active state:
@@ -95,6 +97,8 @@ static inline void
 debug_object_free      (void *addr, struct debug_obj_descr *descr) { }
 static inline void
 debug_object_assert_init(void *addr, struct debug_obj_descr *descr) { }
+static inline bool debug_object_is_tracked(void *addr) { return true };
+static inline void debugobj_kref_splat(void *addr, void *func) {};
 
 static inline void debug_objects_early_init(void) { }
 static inline void debug_objects_mem_init(void) { }
diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..3a878a1 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -20,6 +20,7 @@
 #include <linux/kernel.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
+#include <linux/debugobjects.h>
 
 struct kref {
 	atomic_t refcount;
@@ -32,6 +33,9 @@ struct kref {
 static inline void kref_init(struct kref *kref)
 {
 	atomic_set(&kref->refcount, 1);
+#ifdef CONFIG_DEBUG_OBJECTS_KREF
+	debug_object_init(kref, NULL);
+#endif
 }
 
 /**
@@ -70,6 +74,12 @@ static inline int kref_sub(struct kref *kref, unsigned int count,
 {
 	WARN_ON(release == NULL);
 
+#ifdef CONFIG_DEBUG_OBJECTS_KREF
+	if (!debug_object_is_tracked(kref)) {
+		debugobj_kref_splat(kref, release);
+		return 0;
+	}
+#endif
 	if (atomic_sub_and_test((int) count, &kref->refcount)) {
 		release(kref);
 		return 1;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 06344d9..aa59596 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -375,6 +375,14 @@ config DEBUG_OBJECTS_PERCPU_COUNTER
 	  percpu counter routines to track the life time of percpu counter
 	  objects and validate the percpu counter operations.
 
+config DEBUG_OBJECTS_KREF
+	bool "Debug kref objects"
+	depends on DEBUG_OBJECTS
+	help
+	  If you say Y here, additional code will be insterted into the
+          kref_init() and kref_put() routines to track the lifetime of a kref
+	  object and ensure that an object is not released twice.
+
 config DEBUG_OBJECTS_ENABLE_DEFAULT
 	int "debug_objects bootup default value (0-1)"
         range 0 1
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index bf2c8b1..3488972 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -262,6 +262,21 @@ static void debug_print_object(struct debug_obj *obj, char *msg)
 	debug_objects_warnings++;
 }
 
+#ifdef CONFIG_DEBUG_OBJECTS_KREF
+void debugobj_kref_splat(void *addr, void *func)
+{
+	static int limit;
+
+	if (limit < 5) {
+		WARN(1, "kref_put: unknown reference %p, cleanup %pS\n",
+				addr, func);
+		limit++;
+	}
+	debug_objects_warnings++;
+}
+EXPORT_SYMBOL_GPL(debugobj_kref_splat);
+#endif
+
 /*
  * Try to repair the damage, so we have a better chance to get useful
  * debug output.
@@ -362,6 +377,7 @@ void debug_object_init(void *addr, struct debug_obj_descr *descr)
 
 	__debug_object_init(addr, descr, 0);
 }
+EXPORT_SYMBOL_GPL(debug_object_init);
 
 /**
  * debug_object_init_on_stack - debug checks when an object on stack is
@@ -575,6 +591,7 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr)
 out_unlock:
 	raw_spin_unlock_irqrestore(&db->lock, flags);
 }
+EXPORT_SYMBOL_GPL(debug_object_free);
 
 /**
  * debug_object_assert_init - debug checks when object should be init-ed
@@ -614,6 +631,31 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr)
 	raw_spin_unlock_irqrestore(&db->lock, flags);
 }
 
+bool debug_object_is_tracked(void *addr)
+{
+	struct debug_bucket *db;
+	struct debug_obj *obj;
+	unsigned long flags;
+	bool ret;
+
+	if (!debug_objects_enabled)
+		return true;
+
+	db = get_bucket((unsigned long) addr);
+
+	raw_spin_lock_irqsave(&db->lock, flags);
+
+	obj = lookup_object(addr, db);
+	if (obj)
+		ret = true;
+	else
+		ret = false;
+
+	raw_spin_unlock_irqrestore(&db->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(debug_object_is_tracked);
+
 /**
  * debug_object_active_state - debug checks object usage state machine
  * @addr:	address of the object
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] debugobject: add support for kref
  2013-10-24 10:09 [RFC PATCH] debugobject: add support for kref Sebastian Andrzej Siewior
@ 2013-10-24 14:48 ` Thomas Gleixner
  2013-11-03 19:33   ` [PATCH v2] " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2013-10-24 14:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-kernel, jic23, lars, gregkh

On Thu, 24 Oct 2013, Sebastian Andrzej Siewior wrote:
>  
>  struct kref {
>  	atomic_t refcount;
> @@ -32,6 +33,9 @@ struct kref {
>  static inline void kref_init(struct kref *kref)
>  {
>  	atomic_set(&kref->refcount, 1);
> +#ifdef CONFIG_DEBUG_OBJECTS_KREF
> +	debug_object_init(kref, NULL);

I rather prefer to have a debug_obj_descr handed in. It also should
activate the object, because then you can catch freeing of an active
kref as well. With state INIT the free checks are not triggering.

>  /**
> @@ -70,6 +74,12 @@ static inline int kref_sub(struct kref *kref, unsigned int count,
>  {
>  	WARN_ON(release == NULL);
>  
> +#ifdef CONFIG_DEBUG_OBJECTS_KREF
> +	if (!debug_object_is_tracked(kref)) {
> +		debugobj_kref_splat(kref, release);
> +		return 0;

Why can't you use debug_object_active_state(kref, descr, 0, 0)? All it
needs is a return value which tells you whether the object is tracked
and its state is ACTIVE.

> +	}
> +#endif
>  	if (atomic_sub_and_test((int) count, &kref->refcount)) {
>  		release(kref);
>  		return 1;

What calls debug_object_free() for the kref? You seem to add tracking
objects without ever releasing them. Are you relying on kfree removing
the inactive object from the debugobjects tracking list?

Not really a good idea. There might be some different release function
than kfree. And it relies on DEBUG_OBJECTS_FREE being enabled.

So an explicit call to debug_objects_free() before release(kref) is
required. Otherwise your new check is going to fail even when the kref
is gone already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] debugobject: add support for kref
  2013-10-24 14:48 ` Thomas Gleixner
@ 2013-11-03 19:33   ` Sebastian Andrzej Siewior
  2013-11-04 16:15     ` Sebastian Andrzej Siewior
  2013-12-11  6:45     ` Greg KH
  0 siblings, 2 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-11-03 19:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, jic23, lars, gregkh

I run a couple times into the case where "put" was called on an already
cleanup object. The results was either nothing because "0" went back to
0xff…ff and release was not called a second time or some thing like this
with SLAB once that memory region was used again:

|edma-dma-engine edma-dma-engine.0: freeing channel for 57
|Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256
|070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
|Single bit error detected. Probably bad RAM.
|Run a memory test tool.
|Prev obj: start=edc4c7c0, len=256
|000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|Next obj: start=edc4c9c0, len=256
|000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
|010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

which got me a little confused about the bit flip at first.
The reason for this was resource counting that went wrong followed by a "put"
called from two places.
After the second time running into the same problem using the same driver,
I was looking for something to help me to find atleast one caller (and the
kind of object) a little quicker. Here is a debugobject extension to kref which
seems to do the job.
At kref_init() the debugobject is initialized and activated.
At kref_get() / kref_sub() time it is checked if the kref is active. If
it is, the request is completed otherwise the "usual" debugobject is
printed. Here an example of kref_put() on an already gone object:

|edma-dma-engine edma-dma-engine.0: freeing channel for 57
|------------[ cut here ]------------
|WARNING: CPU: 0 PID: 2053 at lib/debugobjects.c:260 debug_print_object+0x94/0xc4()
|ODEBUG: active_state not available (active state 0) object type: kref hint:   (null)
|Modules linked in: ti_am335x_adc(-)
|[<c0014d38>] (unwind_backtrace+0x0/0xf4) from [<c001249c>] (show_stack+0x14/0x1c)
|[<c001249c>] (show_stack+0x14/0x1c) from [<c0037264>] (warn_slowpath_common+0x64/0x84)
|[<c0037264>] (warn_slowpath_common+0x64/0x84) from [<c0037318>] (warn_slowpath_fmt+0x30/0x40)
|[<c0037318>] (warn_slowpath_fmt+0x30/0x40) from [<c022e8d0>] (debug_print_object+0x94/0xc4)
|[<c022e8d0>] (debug_print_object+0x94/0xc4) from [<c022e9e4>] (debug_object_active_state+0xe4/0x148)
|[<c022e9e4>] (debug_object_active_state+0xe4/0x148) from [<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio])
|[<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio]) from [<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio])
|[<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio]) from [<c0290308>] (device_release+0x2c/0x94)
|[<c0290308>] (device_release+0x2c/0x94) from [<c021f040>] (kobject_release+0x44/0x78)
|[<c021f040>] (kobject_release+0x44/0x78) from [<c029600c>] (release_nodes+0x1a0/0x248)
|[<c029600c>] (release_nodes+0x1a0/0x248) from [<c029355c>] (__device_release_driver+0x98/0xf0)
|[<c029355c>] (__device_release_driver+0x98/0xf0) from [<c0293668>] (driver_detach+0xb4/0xb8)
|[<c0293668>] (driver_detach+0xb4/0xb8) from [<c0292538>] (bus_remove_driver+0x98/0xec)
|[<c0292538>] (bus_remove_driver+0x98/0xec) from [<c008fe2c>] (SyS_delete_module+0x1e0/0x24c)
|[<c008fe2c>] (SyS_delete_module+0x1e0/0x24c) from [<c000e680>] (ret_fast_syscall+0x0/0x48)
|---[ end trace bc5e9551626b178a ]---

This should help to notice that there is a second "put" and the
call chain should point to the object. The "hint" callback is empty because
in the "double free" case we don't have any information and the release
function is passed as an argument to kref_put(). I think the information
is quite good and it is not worth extending debugobject to somehow add
this information.
The "get/put unless" macros aren't traced completely because it is hard
to get it correct without a false positive and without touching each
user. The object might be added to a list which is browsed by someone.
That someone holds the same lock that is required for in the cleanup path
and so the cleanup is blocked. That means that the kref object is
gone from debugobject point of view but the release function has not yet
complete so it is valid to check the current counter.

v1…v2:
- not an RFC anymore
- addressed tglx review:
  - use debug_obj_descr with state active
  - use debug_object_active_state() to check for active object instead the
    other hack I had.
  - added debug_object_free() in a way that does not interfere with the
    NSA sniffer API so it does not get removed from the patch by accident.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

 include/linux/debugobjects.h |  2 +-
 include/linux/kref.h         | 60 ++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug            |  7 ++++++
 lib/debugobjects.c           | 27 +++++++++++++++++---
 4 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/include/linux/debugobjects.h b/include/linux/debugobjects.h
index 98ffcbd..483f487 100644
--- a/include/linux/debugobjects.h
+++ b/include/linux/debugobjects.h
@@ -74,7 +74,7 @@ extern void debug_object_assert_init(void *addr, struct debug_obj_descr *descr);
  * - Set at 0 upon initialization.
  * - Must return to 0 before deactivation.
  */
-extern void
+extern int
 debug_object_active_state(void *addr, struct debug_obj_descr *descr,
 			  unsigned int expect, unsigned int next);
 
diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..f019873 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -20,11 +20,40 @@
 #include <linux/kernel.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
+#include <linux/debugobjects.h>
 
 struct kref {
 	atomic_t refcount;
 };
 
+#ifdef CONFIG_DEBUG_OBJECTS_KREF
+extern struct debug_obj_descr kref_descr;
+
+static inline int kref_check_active(struct kref *kref)
+{
+	return debug_object_active_state(kref, &kref_descr, 0, 0);
+}
+
+static inline void kref_init_activate(struct kref *kref)
+{
+	debug_object_init(kref, &kref_descr);
+	debug_object_activate(kref, &kref_descr);
+}
+
+static inline void kref_deactivate_free(struct kref *kref)
+{
+	debug_object_deactivate(kref, &kref_descr);
+	debug_object_free(kref, &kref_descr);
+}
+
+#else
+
+static inline int kref_check_active(struct kref *kref) { return 0; }
+static inline void kref_init_activate(struct kref *kref) { }
+static inline void kref_deactivate_free(struct kref *kref) { }
+
+#endif
+
 /**
  * kref_init - initialize object.
  * @kref: object in question.
@@ -32,6 +61,7 @@ struct kref {
 static inline void kref_init(struct kref *kref)
 {
 	atomic_set(&kref->refcount, 1);
+	kref_init_activate(kref);
 }
 
 /**
@@ -40,6 +70,12 @@ static inline void kref_init(struct kref *kref)
  */
 static inline void kref_get(struct kref *kref)
 {
+	int ret;
+
+	ret = kref_check_active(kref);
+	if (ret < 0)
+		return;
+
 	/* If refcount was 0 before incrementing then we have a race
 	 * condition when this kref is freeing by some other thread right now.
 	 * In this case one should use kref_get_unless_zero()
@@ -68,9 +104,16 @@ static inline void kref_get(struct kref *kref)
 static inline int kref_sub(struct kref *kref, unsigned int count,
 	     void (*release)(struct kref *kref))
 {
+	int ret;
+
+	ret = kref_check_active(kref);
+	if (ret < 0)
+		return 0;
+
 	WARN_ON(release == NULL);
 
 	if (atomic_sub_and_test((int) count, &kref->refcount)) {
+		kref_deactivate_free(kref);
 		release(kref);
 		return 1;
 	}
@@ -117,16 +160,23 @@ static inline int kref_put_spinlock_irqsave(struct kref *kref,
 		spinlock_t *lock)
 {
 	unsigned long flags;
+	int ret;
 
 	WARN_ON(release == NULL);
+
 	if (atomic_add_unless(&kref->refcount, -1, 1))
 		return 0;
 	spin_lock_irqsave(lock, flags);
+	ret = kref_check_active(kref);
+	if (ret < 0)
+		goto out;
 	if (atomic_dec_and_test(&kref->refcount)) {
+		kref_deactivate_free(kref);
 		release(kref);
 		local_irq_restore(flags);
 		return 1;
 	}
+out:
 	spin_unlock_irqrestore(lock, flags);
 	return 0;
 }
@@ -135,13 +185,23 @@ static inline int kref_put_mutex(struct kref *kref,
 				 void (*release)(struct kref *kref),
 				 struct mutex *lock)
 {
+	int ret;
+
 	WARN_ON(release == NULL);
 	if (unlikely(!atomic_add_unless(&kref->refcount, -1, 1))) {
 		mutex_lock(lock);
+
+		ret = kref_check_active(kref);
+		if (ret < 0) {
+			mutex_unlock(lock);
+			return 0;
+		}
+
 		if (unlikely(!atomic_dec_and_test(&kref->refcount))) {
 			mutex_unlock(lock);
 			return 0;
 		}
+		kref_deactivate_free(kref);
 		release(kref);
 		return 1;
 	}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 06344d9..ed706b8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -375,6 +375,13 @@ config DEBUG_OBJECTS_PERCPU_COUNTER
 	  percpu counter routines to track the life time of percpu counter
 	  objects and validate the percpu counter operations.
 
+config DEBUG_OBJECTS_KREF
+	bool "Debug kref objects"
+	depends on DEBUG_OBJECTS
+	help
+	  If you say Y here, additional code will be insterted into the kref
+	  routines to track the life time of a kref object.
+
 config DEBUG_OBJECTS_ENABLE_DEFAULT
 	int "debug_objects bootup default value (0-1)"
         range 0 1
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index bf2c8b1..b735c29 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -362,6 +362,7 @@ void debug_object_init(void *addr, struct debug_obj_descr *descr)
 
 	__debug_object_init(addr, descr, 0);
 }
+EXPORT_SYMBOL_GPL(debug_object_init);
 
 /**
  * debug_object_init_on_stack - debug checks when an object on stack is
@@ -442,6 +443,7 @@ int debug_object_activate(void *addr, struct debug_obj_descr *descr)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(debug_object_activate);
 
 /**
  * debug_object_deactivate - debug checks when an object is deactivated
@@ -489,6 +491,7 @@ void debug_object_deactivate(void *addr, struct debug_obj_descr *descr)
 
 	raw_spin_unlock_irqrestore(&db->lock, flags);
 }
+EXPORT_SYMBOL_GPL(debug_object_deactivate);
 
 /**
  * debug_object_destroy - debug checks when an object is destroyed
@@ -575,6 +578,7 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr)
 out_unlock:
 	raw_spin_unlock_irqrestore(&db->lock, flags);
 }
+EXPORT_SYMBOL_GPL(debug_object_free);
 
 /**
  * debug_object_assert_init - debug checks when object should be init-ed
@@ -621,16 +625,17 @@ void debug_object_assert_init(void *addr, struct debug_obj_descr *descr)
  * @expect:	expected state
  * @next:	state to move to if expected state is found
  */
-void
+int
 debug_object_active_state(void *addr, struct debug_obj_descr *descr,
 			  unsigned int expect, unsigned int next)
 {
 	struct debug_bucket *db;
 	struct debug_obj *obj;
 	unsigned long flags;
+	int ret;
 
 	if (!debug_objects_enabled)
-		return;
+		return 0;
 
 	db = get_bucket((unsigned long) addr);
 
@@ -640,14 +645,18 @@ debug_object_active_state(void *addr, struct debug_obj_descr *descr,
 	if (obj) {
 		switch (obj->state) {
 		case ODEBUG_STATE_ACTIVE:
-			if (obj->astate == expect)
+			if (obj->astate == expect) {
 				obj->astate = next;
-			else
+				ret = 0;
+			} else {
 				debug_print_object(obj, "active_state");
+				ret = -EINVAL;
+			}
 			break;
 
 		default:
 			debug_print_object(obj, "active_state");
+			ret = -EINVAL;
 			break;
 		}
 	} else {
@@ -656,10 +665,13 @@ debug_object_active_state(void *addr, struct debug_obj_descr *descr,
 				       .descr = descr };
 
 		debug_print_object(&o, "active_state");
+		ret = -EINVAL;
 	}
 
 	raw_spin_unlock_irqrestore(&db->lock, flags);
+	return ret;
 }
+EXPORT_SYMBOL_GPL(debug_object_active_state);
 
 #ifdef CONFIG_DEBUG_OBJECTS_FREE
 static void __debug_check_no_obj_freed(const void *address, unsigned long size)
@@ -1094,3 +1106,10 @@ void __init debug_objects_mem_init(void)
 	} else
 		debug_objects_selftest();
 }
+
+#ifdef CONFIG_DEBUG_OBJECTS_KREF
+struct debug_obj_descr kref_descr = {
+	.name = "kref",
+};
+EXPORT_SYMBOL_GPL(kref_descr);
+#endif
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] debugobject: add support for kref
  2013-11-03 19:33   ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2013-11-04 16:15     ` Sebastian Andrzej Siewior
  2013-12-11  6:45     ` Greg KH
  1 sibling, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-11-04 16:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Sebastian Andrzej Siewior, jic23, lars, gregkh

On 03.11.13, Sebastian Andrzej Siewior wrote:
> v1…v2:
> - not an RFC anymore
> - addressed tglx review:
>   - use debug_obj_descr with state active
>   - use debug_object_active_state() to check for active object instead the
>     other hack I had.
>   - added debug_object_free() in a way that does not interfere with the
>     NSA sniffer API so it does not get removed from the patch by accident.

now I just run into output like
|ODEBUG: free active (active state 0) object type: kref hint: gserial_alloc_line+0x90/0x194 [u_serial]

and yes I added a hint for that which points to where the kref got it
kref_init() call from. Now I am not sure if this a false positive
(we did kref_init() never used it so the whole struct was removed) or if
it really found something.
And while I hit this, I noticed that I really need a fixup function or
else debugobject will loop forever in __debug_check_no_obj_freed().

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] debugobject: add support for kref
  2013-11-03 19:33   ` [PATCH v2] " Sebastian Andrzej Siewior
  2013-11-04 16:15     ` Sebastian Andrzej Siewior
@ 2013-12-11  6:45     ` Greg KH
  2013-12-11  9:21       ` Thomas Gleixner
  2013-12-11  9:30       ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 7+ messages in thread
From: Greg KH @ 2013-12-11  6:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Thomas Gleixner, linux-kernel, jic23, lars

On Sun, Nov 03, 2013 at 08:33:08PM +0100, Sebastian Andrzej Siewior wrote:
> I run a couple times into the case where "put" was called on an already
> cleanup object. The results was either nothing because "0" went back to
> 0xff…ff and release was not called a second time or some thing like this
> with SLAB once that memory region was used again:
> 
> |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> |Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256
> |070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
> |Single bit error detected. Probably bad RAM.
> |Run a memory test tool.
> |Prev obj: start=edc4c7c0, len=256
> |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> |Next obj: start=edc4c9c0, len=256
> |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> 
> which got me a little confused about the bit flip at first.
> The reason for this was resource counting that went wrong followed by a "put"
> called from two places.
> After the second time running into the same problem using the same driver,
> I was looking for something to help me to find atleast one caller (and the
> kind of object) a little quicker. Here is a debugobject extension to kref which
> seems to do the job.
> At kref_init() the debugobject is initialized and activated.
> At kref_get() / kref_sub() time it is checked if the kref is active. If
> it is, the request is completed otherwise the "usual" debugobject is
> printed. Here an example of kref_put() on an already gone object:
> 
> |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> |------------[ cut here ]------------
> |WARNING: CPU: 0 PID: 2053 at lib/debugobjects.c:260 debug_print_object+0x94/0xc4()
> |ODEBUG: active_state not available (active state 0) object type: kref hint:   (null)
> |Modules linked in: ti_am335x_adc(-)
> |[<c0014d38>] (unwind_backtrace+0x0/0xf4) from [<c001249c>] (show_stack+0x14/0x1c)
> |[<c001249c>] (show_stack+0x14/0x1c) from [<c0037264>] (warn_slowpath_common+0x64/0x84)
> |[<c0037264>] (warn_slowpath_common+0x64/0x84) from [<c0037318>] (warn_slowpath_fmt+0x30/0x40)
> |[<c0037318>] (warn_slowpath_fmt+0x30/0x40) from [<c022e8d0>] (debug_print_object+0x94/0xc4)
> |[<c022e8d0>] (debug_print_object+0x94/0xc4) from [<c022e9e4>] (debug_object_active_state+0xe4/0x148)
> |[<c022e9e4>] (debug_object_active_state+0xe4/0x148) from [<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio])
> |[<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio]) from [<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio])
> |[<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio]) from [<c0290308>] (device_release+0x2c/0x94)
> |[<c0290308>] (device_release+0x2c/0x94) from [<c021f040>] (kobject_release+0x44/0x78)
> |[<c021f040>] (kobject_release+0x44/0x78) from [<c029600c>] (release_nodes+0x1a0/0x248)
> |[<c029600c>] (release_nodes+0x1a0/0x248) from [<c029355c>] (__device_release_driver+0x98/0xf0)
> |[<c029355c>] (__device_release_driver+0x98/0xf0) from [<c0293668>] (driver_detach+0xb4/0xb8)
> |[<c0293668>] (driver_detach+0xb4/0xb8) from [<c0292538>] (bus_remove_driver+0x98/0xec)
> |[<c0292538>] (bus_remove_driver+0x98/0xec) from [<c008fe2c>] (SyS_delete_module+0x1e0/0x24c)
> |[<c008fe2c>] (SyS_delete_module+0x1e0/0x24c) from [<c000e680>] (ret_fast_syscall+0x0/0x48)
> |---[ end trace bc5e9551626b178a ]---
> 
> This should help to notice that there is a second "put" and the
> call chain should point to the object. The "hint" callback is empty because
> in the "double free" case we don't have any information and the release
> function is passed as an argument to kref_put(). I think the information
> is quite good and it is not worth extending debugobject to somehow add
> this information.
> The "get/put unless" macros aren't traced completely because it is hard
> to get it correct without a false positive and without touching each
> user. The object might be added to a list which is browsed by someone.
> That someone holds the same lock that is required for in the cleanup path
> and so the cleanup is blocked. That means that the kref object is
> gone from debugobject point of view but the release function has not yet
> complete so it is valid to check the current counter.
> 
> v1…v2:
> - not an RFC anymore
> - addressed tglx review:
>   - use debug_obj_descr with state active
>   - use debug_object_active_state() to check for active object instead the
>     other hack I had.
>   - added debug_object_free() in a way that does not interfere with the
>     NSA sniffer API so it does not get removed from the patch by accident.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I need an ack from Thomas or some other debugobjects-knowledgable
developer before I can take this...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] debugobject: add support for kref
  2013-12-11  6:45     ` Greg KH
@ 2013-12-11  9:21       ` Thomas Gleixner
  2013-12-11  9:30       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2013-12-11  9:21 UTC (permalink / raw)
  To: Greg KH; +Cc: Sebastian Andrzej Siewior, linux-kernel, jic23, lars

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5189 bytes --]

On Tue, 10 Dec 2013, Greg KH wrote:
> On Sun, Nov 03, 2013 at 08:33:08PM +0100, Sebastian Andrzej Siewior wrote:
> > I run a couple times into the case where "put" was called on an already
> > cleanup object. The results was either nothing because "0" went back to
> > 0xff…ff and release was not called a second time or some thing like this
> > with SLAB once that memory region was used again:
> > 
> > |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> > |Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256
> > |070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
> > |Single bit error detected. Probably bad RAM.
> > |Run a memory test tool.
> > |Prev obj: start=edc4c7c0, len=256
> > |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > |Next obj: start=edc4c9c0, len=256
> > |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> > 
> > which got me a little confused about the bit flip at first.
> > The reason for this was resource counting that went wrong followed by a "put"
> > called from two places.
> > After the second time running into the same problem using the same driver,
> > I was looking for something to help me to find atleast one caller (and the
> > kind of object) a little quicker. Here is a debugobject extension to kref which
> > seems to do the job.
> > At kref_init() the debugobject is initialized and activated.
> > At kref_get() / kref_sub() time it is checked if the kref is active. If
> > it is, the request is completed otherwise the "usual" debugobject is
> > printed. Here an example of kref_put() on an already gone object:
> > 
> > |edma-dma-engine edma-dma-engine.0: freeing channel for 57
> > |------------[ cut here ]------------
> > |WARNING: CPU: 0 PID: 2053 at lib/debugobjects.c:260 debug_print_object+0x94/0xc4()
> > |ODEBUG: active_state not available (active state 0) object type: kref hint:   (null)
> > |Modules linked in: ti_am335x_adc(-)
> > |[<c0014d38>] (unwind_backtrace+0x0/0xf4) from [<c001249c>] (show_stack+0x14/0x1c)
> > |[<c001249c>] (show_stack+0x14/0x1c) from [<c0037264>] (warn_slowpath_common+0x64/0x84)
> > |[<c0037264>] (warn_slowpath_common+0x64/0x84) from [<c0037318>] (warn_slowpath_fmt+0x30/0x40)
> > |[<c0037318>] (warn_slowpath_fmt+0x30/0x40) from [<c022e8d0>] (debug_print_object+0x94/0xc4)
> > |[<c022e8d0>] (debug_print_object+0x94/0xc4) from [<c022e9e4>] (debug_object_active_state+0xe4/0x148)
> > |[<c022e9e4>] (debug_object_active_state+0xe4/0x148) from [<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio])
> > |[<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio]) from [<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio])
> > |[<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio]) from [<c0290308>] (device_release+0x2c/0x94)
> > |[<c0290308>] (device_release+0x2c/0x94) from [<c021f040>] (kobject_release+0x44/0x78)
> > |[<c021f040>] (kobject_release+0x44/0x78) from [<c029600c>] (release_nodes+0x1a0/0x248)
> > |[<c029600c>] (release_nodes+0x1a0/0x248) from [<c029355c>] (__device_release_driver+0x98/0xf0)
> > |[<c029355c>] (__device_release_driver+0x98/0xf0) from [<c0293668>] (driver_detach+0xb4/0xb8)
> > |[<c0293668>] (driver_detach+0xb4/0xb8) from [<c0292538>] (bus_remove_driver+0x98/0xec)
> > |[<c0292538>] (bus_remove_driver+0x98/0xec) from [<c008fe2c>] (SyS_delete_module+0x1e0/0x24c)
> > |[<c008fe2c>] (SyS_delete_module+0x1e0/0x24c) from [<c000e680>] (ret_fast_syscall+0x0/0x48)
> > |---[ end trace bc5e9551626b178a ]---
> > 
> > This should help to notice that there is a second "put" and the
> > call chain should point to the object. The "hint" callback is empty because
> > in the "double free" case we don't have any information and the release
> > function is passed as an argument to kref_put(). I think the information
> > is quite good and it is not worth extending debugobject to somehow add
> > this information.
> > The "get/put unless" macros aren't traced completely because it is hard
> > to get it correct without a false positive and without touching each
> > user. The object might be added to a list which is browsed by someone.
> > That someone holds the same lock that is required for in the cleanup path
> > and so the cleanup is blocked. That means that the kref object is
> > gone from debugobject point of view but the release function has not yet
> > complete so it is valid to check the current counter.
> > 
> > v1…v2:
> > - not an RFC anymore
> > - addressed tglx review:
> >   - use debug_obj_descr with state active
> >   - use debug_object_active_state() to check for active object instead the
> >     other hack I had.
> >   - added debug_object_free() in a way that does not interfere with the
> >     NSA sniffer API so it does not get removed from the patch by accident.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> I need an ack from Thomas or some other debugobjects-knowledgable
> developer before I can take this...

Reviwed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] debugobject: add support for kref
  2013-12-11  6:45     ` Greg KH
  2013-12-11  9:21       ` Thomas Gleixner
@ 2013-12-11  9:30       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-12-11  9:30 UTC (permalink / raw)
  To: Greg KH; +Cc: Thomas Gleixner, linux-kernel, jic23, lars

On 12/11/2013 07:45 AM, Greg KH wrote:
> I need an ack from Thomas or some other debugobjects-knowledgable
> developer before I can take this...

I have a v3 which I haven't posted yet. Basically the whole thing seems
to work. "Seems" means that I get a bunch of backtraces which I haven't
checked to see if it is a kref-accounting bug (something that should be
fixed) or something that is correct but kref isn't used properly.

One example is serial port init via tty_port_init(): It initializes
&port->kref. To use that kref you need to use tty_port_get()/put() and
more importantly dynamically allocate tty_port _or_ provide
port->ops->destruct callback. This seems not be done by everyone, for
instance gs_port_alloc() [0]. The question is this a bug and should be
fixed (and kref's debugobject is right here) or it is "legal" behavior.

I get another backtrace on "rmmod sg" which shows
ODEBUG: free active (active state 0) object type: kref hint:
kset_init+0xe/0x30
[<c112fec8>] kfree+0x128/0x280
[<c13784c2>] ? class_release+0x22/0x30
[<c13784c2>] class_release+0x22/0x30
[<c12da3bd>] kobject_cleanup+0x2d/0x70
[<c12da2c7>] kobject_put+0x67/0xa0
[<c12da34f>] kset_unregister+0xf/0x20
[<c13787b3>] class_destroy+0x23/0x30
[<f850ee62>] exit_sg+0x17/0x3c [sg]

so it is triggered by class_destory() and I haven't have the time check
what is wrong here.

Based on this I would like to know what could be the best way forward
here. Post v3 and work on this backtraces before it gets merged or
after it gets merged.

[0] drivers/usb/gadget/u_serial.c

> thanks,
> 
> greg k-h
> 
Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-12-11  9:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-24 10:09 [RFC PATCH] debugobject: add support for kref Sebastian Andrzej Siewior
2013-10-24 14:48 ` Thomas Gleixner
2013-11-03 19:33   ` [PATCH v2] " Sebastian Andrzej Siewior
2013-11-04 16:15     ` Sebastian Andrzej Siewior
2013-12-11  6:45     ` Greg KH
2013-12-11  9:21       ` Thomas Gleixner
2013-12-11  9:30       ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).