All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
@ 2009-10-06 14:37 Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

Here is a patchset, done on 2.6.30.9, which permits to detect and fix racy
call_rcu() users.

Fix for vunmap, which happens to be one of them, is included.

Note that some false-positives might come up when call_rcu() is called on
uninitialized struct rcu_head. The solution is, surprisingly enough, to
initialize them.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 1/4] kernel call_rcu usage: initialize rcu_head structures
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
@ 2009-10-06 14:37 ` Mathieu Desnoyers
  2009-10-06 15:19   ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures (v2) Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 2/4] tree rcu: Add debug RCU head option Mathieu Desnoyers
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney; +Cc: Mathieu Desnoyers

[-- Attachment #1: rcu-init-null.patch --]
[-- Type: text/plain, Size: 1884 bytes --]

Initialize rcu_head structures before passing them to call_rcu() to eliminate
false positives in DEBUG_RCU_HEAD.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: mingo@elte.hu
CC: akpm@linux-foundation.org
---
 kernel/exit.c     |    1 +
 kernel/pid.c      |    1 +
 kernel/rcupdate.c |    1 +
 3 files changed, 3 insertions(+)

Index: linux-2.6-lttng/kernel/rcupdate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/rcupdate.c	2009-10-06 09:00:06.000000000 -0400
+++ linux-2.6-lttng/kernel/rcupdate.c	2009-10-06 09:01:14.000000000 -0400
@@ -92,6 +92,7 @@ void synchronize_rcu(void)
 
 	init_completion(&rcu.completion);
 	/* Will wake me after RCU finished. */
+	INIT_RCU_HEAD(&rcu.head);
 	call_rcu(&rcu.head, wakeme_after_rcu);
 	/* Wait for it. */
 	wait_for_completion(&rcu.completion);
Index: linux-2.6-lttng/kernel/exit.c
===================================================================
--- linux-2.6-lttng.orig/kernel/exit.c	2009-10-06 09:05:21.000000000 -0400
+++ linux-2.6-lttng/kernel/exit.c	2009-10-06 09:05:34.000000000 -0400
@@ -208,6 +208,7 @@ repeat:
 
 	write_unlock_irq(&tasklist_lock);
 	release_thread(p);
+	INIT_RCU_HEAD(&p->rcu);
 	call_rcu(&p->rcu, delayed_put_task_struct);
 
 	p = leader;
Index: linux-2.6-lttng/kernel/pid.c
===================================================================
--- linux-2.6-lttng.orig/kernel/pid.c	2009-10-06 09:10:59.000000000 -0400
+++ linux-2.6-lttng/kernel/pid.c	2009-10-06 09:11:43.000000000 -0400
@@ -236,6 +236,7 @@ void free_pid(struct pid *pid)
 	for (i = 0; i <= pid->level; i++)
 		free_pidmap(pid->numbers + i);
 
+	INIT_RCU_HEAD(&pid->rcu);
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 2/4] tree rcu: Add debug RCU head option
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
@ 2009-10-06 14:37 ` Mathieu Desnoyers
  2009-10-06 15:54   ` Eric Dumazet
  2009-10-06 14:37 ` [patch 3/4] markers call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney; +Cc: Mathieu Desnoyers

[-- Attachment #1: rcu-head-debug.patch --]
[-- Type: text/plain, Size: 2989 bytes --]

Poisoning the rcu_head callback list. Only for rcu tree for now.

Helps finding racy users of call_rcu(), which results in hangs because list
entries are overwritten and/or skipped. Using the lower bit to poison because
include/net/dst.h __pad_to_align_refcnt complains when struct rcu_head grows.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: mingo@elte.hu
CC: akpm@linux-foundation.org
---
 include/linux/rcupdate.h |    2 ++
 kernel/rcutree.c         |   10 ++++++++++
 lib/Kconfig.debug        |    9 +++++++++
 3 files changed, 21 insertions(+)

Index: linux-2.6-lttng/include/linux/rcupdate.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/rcupdate.h	2009-10-06 10:35:15.000000000 -0400
+++ linux-2.6-lttng/include/linux/rcupdate.h	2009-10-06 10:35:18.000000000 -0400
@@ -45,6 +45,8 @@
  * struct rcu_head - callback structure for use with RCU
  * @next: next update requests in a list
  * @func: actual update function to call after the grace period.
+ *
+ * Debug mode assumes func pointer value is word-aligned.
  */
 struct rcu_head {
 	struct rcu_head *next;
Index: linux-2.6-lttng/kernel/rcutree.c
===================================================================
--- linux-2.6-lttng.orig/kernel/rcutree.c	2009-10-06 10:35:15.000000000 -0400
+++ linux-2.6-lttng/kernel/rcutree.c	2009-10-06 10:35:27.000000000 -0400
@@ -927,6 +927,10 @@ static void rcu_do_batch(struct rcu_data
 		next = list->next;
 		prefetch(next);
 		trace_rcu_tree_callback(list);
+#ifdef DEBUG_RCU_HEAD
+		WARN_ON_ONCE(!((unsigned long)list->func & 0x1));
+		list->func = (void *)((unsigned long)list->func & ~0x1);
+#endif
 		list->func(list);
 		list = next;
 		if (++count >= rdp->blimit)
@@ -1194,7 +1198,13 @@ __call_rcu(struct rcu_head *head, void (
 	unsigned long flags;
 	struct rcu_data *rdp;
 
+#ifdef DEBUG_RCU_HEAD
+	WARN_ON_ONCE((unsigned long)head->func & 0x1);
+	WARN_ON_ONCE((unsigned long)func & 0x1);
+	head->func = (void *)((unsigned long)func | 0x1);
+#else
 	head->func = func;
+#endif
 	head->next = NULL;
 
 	smp_mb(); /* Ensure RCU update seen before callback registry. */
Index: linux-2.6-lttng/lib/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/lib/Kconfig.debug	2009-10-06 10:35:15.000000000 -0400
+++ linux-2.6-lttng/lib/Kconfig.debug	2009-10-06 10:35:18.000000000 -0400
@@ -598,6 +598,15 @@ config DEBUG_LIST
 
 	  If unsure, say N.
 
+config DEBUG_RCU_HEAD
+	bool "Debug RCU callbacks"
+	depends on DEBUG_KERNEL
+	depends on TREE_RCU
+	help
+	  Enable this to turn on debugging of RCU list heads (call_rcu() usage).
+	  Seems to find problems more quickly with stress-tests in single-cpu
+	  mode.
+
 config DEBUG_SG
 	bool "Debug SG table operations"
 	depends on DEBUG_KERNEL

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 3/4] markers call_rcu usage: initialize rcu_head structures
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 2/4] tree rcu: Add debug RCU head option Mathieu Desnoyers
@ 2009-10-06 14:37 ` Mathieu Desnoyers
  2009-10-06 14:37   ` Mathieu Desnoyers
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney; +Cc: Mathieu Desnoyers

[-- Attachment #1: marker-fix-rcu.patch --]
[-- Type: text/plain, Size: 901 bytes --]

Initialize rcu_head structures before passing them to call_rcu() to eliminate
false positives in DEBUG_RCU_HEAD.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: mingo@elte.hu
CC: akpm@linux-foundation.org
---
 kernel/marker.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6-lttng/kernel/marker.c
===================================================================
--- linux-2.6-lttng.orig/kernel/marker.c	2009-10-06 09:16:32.000000000 -0400
+++ linux-2.6-lttng/kernel/marker.c	2009-10-06 09:22:07.000000000 -0400
@@ -451,6 +451,7 @@ static struct marker_entry *add_marker(c
 	e->format_allocated = 0;
 	e->refcount = 0;
 	e->rcu_pending = 0;
+	INIT_RCU_HEAD(&e->rcu);
 	hlist_add_head(&e->hlist, head);
 	return e;
 }

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 4/4] vunmap: Fix racy use of rcu_head
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
@ 2009-10-06 14:37   ` Mathieu Desnoyers
  2009-10-06 14:37 ` [patch 2/4] tree rcu: Add debug RCU head option Mathieu Desnoyers
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney
  Cc: Mathieu Desnoyers, cl, linux-mm

[-- Attachment #1: vunmap-fix-teardown.patch --]
[-- Type: text/plain, Size: 3613 bytes --]

Repetitive use of vmap/vunmap on the same address triggers this bug.

Simplest fix: directly kfree the data structure rather than doing it lazily.

Impact:
(-) slight slowdown on vunmap
(+) stop messing up rcu callback lists ;)

Caught it with DEBUG_RCU_HEAD, running LTTng armall/disarmall in loops.
Immediate values (with breakpoint-ipi scheme) are still using vunmap.

------------[ cut here ]------------
WARNING: at kernel/rcutree.c:1199 __call_rcu+0x181/0x190()
Hardware name: X7DAL
Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
Pid: 4527, comm: ltt-armall Not tainted 2.6.30.9-trace #30
Call Trace:
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff8023a6a9>] ? warn_slowpath_common+0x79/0xd0
 [<ffffffff802b4890>] ? rcu_free_va+0x0/0x10
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff802b5298>] ? __purge_vmap_area_lazy+0x1a8/0x1e0
 [<ffffffff802b59d4>] ? free_unmap_vmap_area_noflush+0x74/0x80
 [<ffffffff802b5a0e>] ? remove_vm_area+0x2e/0x80
 [<ffffffff802b5b35>] ? __vunmap+0x45/0xf0
 [<ffffffffa002e826>] ? ltt_statedump_start+0x7b6/0x820 [ltt_sta]
 [<ffffffff8068978a>] ? arch_imv_update+0x16a/0x2f0
 [<ffffffff8027dd13>] ? imv_update_range+0x53/0xa0
 [<ffffffff8026235b>] ? _module_imv_update+0x4b/0x60
 [<ffffffff80262385>] ? module_imv_update+0x15/0x30
 [<ffffffff80280049>] ? marker_probe_register+0x149/0xb90
 [<ffffffff8040c8e0>] ? ltt_vtrace+0x0/0x8c0
 [<ffffffff80409330>] ? ltt_marker_connect+0xd0/0x150
 [<ffffffff8040f8dc>] ? marker_enable_write+0xec/0x120
 [<ffffffff802c6cb8>] ? __dentry_open+0x268/0x350
 [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
 [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
 [<ffffffff802c973b>] ? vfs_write+0xcb/0x170
 [<ffffffff802c9984>] ? sys_write+0x64/0x130
 [<ffffffff8020beeb>] ? system_call_fastpath+0x16/0x1b
---[ end trace ef92443e716fa199 ]---
------------[ cut here ]------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: cl@linux-foundation.org
CC: mingo@elte.hu
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linux-mm@kvack.org
CC: akpm@linux-foundation.org
---
 mm/vmalloc.c |   18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

Index: linux-2.6-lttng/mm/vmalloc.c
===================================================================
--- linux-2.6-lttng.orig/mm/vmalloc.c	2009-10-06 09:37:04.000000000 -0400
+++ linux-2.6-lttng/mm/vmalloc.c	2009-10-06 09:37:56.000000000 -0400
@@ -417,13 +417,6 @@ overflow:
 	return va;
 }
 
-static void rcu_free_va(struct rcu_head *head)
-{
-	struct vmap_area *va = container_of(head, struct vmap_area, rcu_head);
-
-	kfree(va);
-}
-
 static void __free_vmap_area(struct vmap_area *va)
 {
 	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
@@ -431,7 +424,7 @@ static void __free_vmap_area(struct vmap
 	RB_CLEAR_NODE(&va->rb_node);
 	list_del_rcu(&va->list);
 
-	call_rcu(&va->rcu_head, rcu_free_va);
+	kfree(va);
 }
 
 /*
@@ -757,13 +750,6 @@ static struct vmap_block *new_vmap_block
 	return vb;
 }
 
-static void rcu_free_vb(struct rcu_head *head)
-{
-	struct vmap_block *vb = container_of(head, struct vmap_block, rcu_head);
-
-	kfree(vb);
-}
-
 static void free_vmap_block(struct vmap_block *vb)
 {
 	struct vmap_block *tmp;
@@ -778,7 +764,7 @@ static void free_vmap_block(struct vmap_
 	BUG_ON(tmp != vb);
 
 	free_unmap_vmap_area_noflush(vb->va);
-	call_rcu(&vb->rcu_head, rcu_free_vb);
+	kfree(vb);
 }
 
 static void *vb_alloc(unsigned long size, gfp_t gfp_mask)

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch 4/4] vunmap: Fix racy use of rcu_head
@ 2009-10-06 14:37   ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 14:37 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney
  Cc: Mathieu Desnoyers, cl, linux-mm

[-- Attachment #1: vunmap-fix-teardown.patch --]
[-- Type: text/plain, Size: 3839 bytes --]

Repetitive use of vmap/vunmap on the same address triggers this bug.

Simplest fix: directly kfree the data structure rather than doing it lazily.

Impact:
(-) slight slowdown on vunmap
(+) stop messing up rcu callback lists ;)

Caught it with DEBUG_RCU_HEAD, running LTTng armall/disarmall in loops.
Immediate values (with breakpoint-ipi scheme) are still using vunmap.

------------[ cut here ]------------
WARNING: at kernel/rcutree.c:1199 __call_rcu+0x181/0x190()
Hardware name: X7DAL
Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
Pid: 4527, comm: ltt-armall Not tainted 2.6.30.9-trace #30
Call Trace:
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff8023a6a9>] ? warn_slowpath_common+0x79/0xd0
 [<ffffffff802b4890>] ? rcu_free_va+0x0/0x10
 [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
 [<ffffffff802b5298>] ? __purge_vmap_area_lazy+0x1a8/0x1e0
 [<ffffffff802b59d4>] ? free_unmap_vmap_area_noflush+0x74/0x80
 [<ffffffff802b5a0e>] ? remove_vm_area+0x2e/0x80
 [<ffffffff802b5b35>] ? __vunmap+0x45/0xf0
 [<ffffffffa002e826>] ? ltt_statedump_start+0x7b6/0x820 [ltt_sta]
 [<ffffffff8068978a>] ? arch_imv_update+0x16a/0x2f0
 [<ffffffff8027dd13>] ? imv_update_range+0x53/0xa0
 [<ffffffff8026235b>] ? _module_imv_update+0x4b/0x60
 [<ffffffff80262385>] ? module_imv_update+0x15/0x30
 [<ffffffff80280049>] ? marker_probe_register+0x149/0xb90
 [<ffffffff8040c8e0>] ? ltt_vtrace+0x0/0x8c0
 [<ffffffff80409330>] ? ltt_marker_connect+0xd0/0x150
 [<ffffffff8040f8dc>] ? marker_enable_write+0xec/0x120
 [<ffffffff802c6cb8>] ? __dentry_open+0x268/0x350
 [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
 [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
 [<ffffffff802c973b>] ? vfs_write+0xcb/0x170
 [<ffffffff802c9984>] ? sys_write+0x64/0x130
 [<ffffffff8020beeb>] ? system_call_fastpath+0x16/0x1b
---[ end trace ef92443e716fa199 ]---
------------[ cut here ]------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: cl@linux-foundation.org
CC: mingo@elte.hu
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: linux-mm@kvack.org
CC: akpm@linux-foundation.org
---
 mm/vmalloc.c |   18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

Index: linux-2.6-lttng/mm/vmalloc.c
===================================================================
--- linux-2.6-lttng.orig/mm/vmalloc.c	2009-10-06 09:37:04.000000000 -0400
+++ linux-2.6-lttng/mm/vmalloc.c	2009-10-06 09:37:56.000000000 -0400
@@ -417,13 +417,6 @@ overflow:
 	return va;
 }
 
-static void rcu_free_va(struct rcu_head *head)
-{
-	struct vmap_area *va = container_of(head, struct vmap_area, rcu_head);
-
-	kfree(va);
-}
-
 static void __free_vmap_area(struct vmap_area *va)
 {
 	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
@@ -431,7 +424,7 @@ static void __free_vmap_area(struct vmap
 	RB_CLEAR_NODE(&va->rb_node);
 	list_del_rcu(&va->list);
 
-	call_rcu(&va->rcu_head, rcu_free_va);
+	kfree(va);
 }
 
 /*
@@ -757,13 +750,6 @@ static struct vmap_block *new_vmap_block
 	return vb;
 }
 
-static void rcu_free_vb(struct rcu_head *head)
-{
-	struct vmap_block *vb = container_of(head, struct vmap_block, rcu_head);
-
-	kfree(vb);
-}
-
 static void free_vmap_block(struct vmap_block *vb)
 {
 	struct vmap_block *tmp;
@@ -778,7 +764,7 @@ static void free_vmap_block(struct vmap_
 	BUG_ON(tmp != vb);
 
 	free_unmap_vmap_area_noflush(vb->va);
-	call_rcu(&vb->rcu_head, rcu_free_vb);
+	kfree(vb);
 }
 
 static void *vb_alloc(unsigned long size, gfp_t gfp_mask)

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 1/4] kernel call_rcu usage: initialize rcu_head structures (v2)
  2009-10-06 14:37 ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
@ 2009-10-06 15:19   ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 15:19 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

Initialize rcu_head structures before passing them to call_rcu() to eliminate
false positives in DEBUG_RCU_HEAD.

Update: init at structure allocation time rather than just before
calling call_rcu(). Ensures we do not void the debug race test.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: mingo@elte.hu
CC: akpm@linux-foundation.org
---
 kernel/fork.c     |    1 +
 kernel/pid.c      |    1 +
 kernel/rcupdate.c |    1 +
 3 files changed, 3 insertions(+)

Index: linux-2.6-lttng/kernel/rcupdate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/rcupdate.c	2009-10-06 10:34:25.000000000 -0400
+++ linux-2.6-lttng/kernel/rcupdate.c	2009-10-06 11:17:16.000000000 -0400
@@ -90,6 +90,7 @@ void synchronize_rcu(void)
 	if (rcu_blocking_is_gp())
 		return;
 
+	INIT_RCU_HEAD(&rcu.head);
 	init_completion(&rcu.completion);
 	/* Will wake me after RCU finished. */
 	call_rcu(&rcu.head, wakeme_after_rcu);
Index: linux-2.6-lttng/kernel/pid.c
===================================================================
--- linux-2.6-lttng.orig/kernel/pid.c	2009-10-06 10:34:25.000000000 -0400
+++ linux-2.6-lttng/kernel/pid.c	2009-10-06 11:12:46.000000000 -0400
@@ -264,6 +264,7 @@ struct pid *alloc_pid(struct pid_namespa
 
 	get_pid_ns(ns);
 	pid->level = ns->level;
+	INIT_RCU_HEAD(&pid->rcu);
 	atomic_set(&pid->count, 1);
 	for (type = 0; type < PIDTYPE_MAX; ++type)
 		INIT_HLIST_HEAD(&pid->tasks[type]);
Index: linux-2.6-lttng/kernel/fork.c
===================================================================
--- linux-2.6-lttng.orig/kernel/fork.c	2009-10-06 11:11:10.000000000 -0400
+++ linux-2.6-lttng/kernel/fork.c	2009-10-06 11:11:54.000000000 -0400
@@ -1012,6 +1012,7 @@ static struct task_struct *copy_process(
 	p->rcu_read_lock_nesting = 0;
 	p->rcu_flipctr_idx = 0;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
+	INIT_RCU_HEAD(&p->rcu);
 	p->vfork_done = NULL;
 	spin_lock_init(&p->alloc_lock);
 
-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
  2009-10-06 14:37   ` Mathieu Desnoyers
@ 2009-10-06 15:23     ` Mathieu Desnoyers
  -1 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 15:23 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney; +Cc: cl, linux-mm

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> Repetitive use of vmap/vunmap on the same address triggers this bug.
> 
> Simplest fix: directly kfree the data structure rather than doing it lazily.
> 

I assumed that call_rcu in vunmap() is only used to postpone kfree()
from the hot path, but I ask for comfirmation/infirmation about that. I
fear delayed reclamation might be there for a reason. Are there rcu read
sides depending on this behavior ?

If yes, then this patch is wrong, and we could change it for

synchronize_rcu();
kfree(....);

Mathieu

> Impact:
> (-) slight slowdown on vunmap
> (+) stop messing up rcu callback lists ;)
> 
> Caught it with DEBUG_RCU_HEAD, running LTTng armall/disarmall in loops.
> Immediate values (with breakpoint-ipi scheme) are still using vunmap.
> 
> ------------[ cut here ]------------
> WARNING: at kernel/rcutree.c:1199 __call_rcu+0x181/0x190()
> Hardware name: X7DAL
> Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
> Pid: 4527, comm: ltt-armall Not tainted 2.6.30.9-trace #30
> Call Trace:
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff8023a6a9>] ? warn_slowpath_common+0x79/0xd0
>  [<ffffffff802b4890>] ? rcu_free_va+0x0/0x10
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff802b5298>] ? __purge_vmap_area_lazy+0x1a8/0x1e0
>  [<ffffffff802b59d4>] ? free_unmap_vmap_area_noflush+0x74/0x80
>  [<ffffffff802b5a0e>] ? remove_vm_area+0x2e/0x80
>  [<ffffffff802b5b35>] ? __vunmap+0x45/0xf0
>  [<ffffffffa002e826>] ? ltt_statedump_start+0x7b6/0x820 [ltt_sta]
>  [<ffffffff8068978a>] ? arch_imv_update+0x16a/0x2f0
>  [<ffffffff8027dd13>] ? imv_update_range+0x53/0xa0
>  [<ffffffff8026235b>] ? _module_imv_update+0x4b/0x60
>  [<ffffffff80262385>] ? module_imv_update+0x15/0x30
>  [<ffffffff80280049>] ? marker_probe_register+0x149/0xb90
>  [<ffffffff8040c8e0>] ? ltt_vtrace+0x0/0x8c0
>  [<ffffffff80409330>] ? ltt_marker_connect+0xd0/0x150
>  [<ffffffff8040f8dc>] ? marker_enable_write+0xec/0x120
>  [<ffffffff802c6cb8>] ? __dentry_open+0x268/0x350
>  [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
>  [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
>  [<ffffffff802c973b>] ? vfs_write+0xcb/0x170
>  [<ffffffff802c9984>] ? sys_write+0x64/0x130
>  [<ffffffff8020beeb>] ? system_call_fastpath+0x16/0x1b
> ---[ end trace ef92443e716fa199 ]---
> ------------[ cut here ]------------
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: cl@linux-foundation.org
> CC: mingo@elte.hu
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: linux-mm@kvack.org
> CC: akpm@linux-foundation.org
> ---
>  mm/vmalloc.c |   18 ++----------------
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> Index: linux-2.6-lttng/mm/vmalloc.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/vmalloc.c	2009-10-06 09:37:04.000000000 -0400
> +++ linux-2.6-lttng/mm/vmalloc.c	2009-10-06 09:37:56.000000000 -0400
> @@ -417,13 +417,6 @@ overflow:
>  	return va;
>  }
>  
> -static void rcu_free_va(struct rcu_head *head)
> -{
> -	struct vmap_area *va = container_of(head, struct vmap_area, rcu_head);
> -
> -	kfree(va);
> -}
> -
>  static void __free_vmap_area(struct vmap_area *va)
>  {
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> @@ -431,7 +424,7 @@ static void __free_vmap_area(struct vmap
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);
>  
> -	call_rcu(&va->rcu_head, rcu_free_va);
> +	kfree(va);
>  }
>  
>  /*
> @@ -757,13 +750,6 @@ static struct vmap_block *new_vmap_block
>  	return vb;
>  }
>  
> -static void rcu_free_vb(struct rcu_head *head)
> -{
> -	struct vmap_block *vb = container_of(head, struct vmap_block, rcu_head);
> -
> -	kfree(vb);
> -}
> -
>  static void free_vmap_block(struct vmap_block *vb)
>  {
>  	struct vmap_block *tmp;
> @@ -778,7 +764,7 @@ static void free_vmap_block(struct vmap_
>  	BUG_ON(tmp != vb);
>  
>  	free_unmap_vmap_area_noflush(vb->va);
> -	call_rcu(&vb->rcu_head, rcu_free_vb);
> +	kfree(vb);
>  }
>  
>  static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
@ 2009-10-06 15:23     ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 15:23 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney; +Cc: cl, linux-mm

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> Repetitive use of vmap/vunmap on the same address triggers this bug.
> 
> Simplest fix: directly kfree the data structure rather than doing it lazily.
> 

I assumed that call_rcu in vunmap() is only used to postpone kfree()
from the hot path, but I ask for comfirmation/infirmation about that. I
fear delayed reclamation might be there for a reason. Are there rcu read
sides depending on this behavior ?

If yes, then this patch is wrong, and we could change it for

synchronize_rcu();
kfree(....);

Mathieu

> Impact:
> (-) slight slowdown on vunmap
> (+) stop messing up rcu callback lists ;)
> 
> Caught it with DEBUG_RCU_HEAD, running LTTng armall/disarmall in loops.
> Immediate values (with breakpoint-ipi scheme) are still using vunmap.
> 
> ------------[ cut here ]------------
> WARNING: at kernel/rcutree.c:1199 __call_rcu+0x181/0x190()
> Hardware name: X7DAL
> Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
> Pid: 4527, comm: ltt-armall Not tainted 2.6.30.9-trace #30
> Call Trace:
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff8023a6a9>] ? warn_slowpath_common+0x79/0xd0
>  [<ffffffff802b4890>] ? rcu_free_va+0x0/0x10
>  [<ffffffff8027b181>] ? __call_rcu+0x181/0x190
>  [<ffffffff802b5298>] ? __purge_vmap_area_lazy+0x1a8/0x1e0
>  [<ffffffff802b59d4>] ? free_unmap_vmap_area_noflush+0x74/0x80
>  [<ffffffff802b5a0e>] ? remove_vm_area+0x2e/0x80
>  [<ffffffff802b5b35>] ? __vunmap+0x45/0xf0
>  [<ffffffffa002e826>] ? ltt_statedump_start+0x7b6/0x820 [ltt_sta]
>  [<ffffffff8068978a>] ? arch_imv_update+0x16a/0x2f0
>  [<ffffffff8027dd13>] ? imv_update_range+0x53/0xa0
>  [<ffffffff8026235b>] ? _module_imv_update+0x4b/0x60
>  [<ffffffff80262385>] ? module_imv_update+0x15/0x30
>  [<ffffffff80280049>] ? marker_probe_register+0x149/0xb90
>  [<ffffffff8040c8e0>] ? ltt_vtrace+0x0/0x8c0
>  [<ffffffff80409330>] ? ltt_marker_connect+0xd0/0x150
>  [<ffffffff8040f8dc>] ? marker_enable_write+0xec/0x120
>  [<ffffffff802c6cb8>] ? __dentry_open+0x268/0x350
>  [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
>  [<ffffffff80280b2c>] ? marker_probe_cb+0x9c/0x170
>  [<ffffffff802c973b>] ? vfs_write+0xcb/0x170
>  [<ffffffff802c9984>] ? sys_write+0x64/0x130
>  [<ffffffff8020beeb>] ? system_call_fastpath+0x16/0x1b
> ---[ end trace ef92443e716fa199 ]---
> ------------[ cut here ]------------
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> CC: cl@linux-foundation.org
> CC: mingo@elte.hu
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: linux-mm@kvack.org
> CC: akpm@linux-foundation.org
> ---
>  mm/vmalloc.c |   18 ++----------------
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> Index: linux-2.6-lttng/mm/vmalloc.c
> ===================================================================
> --- linux-2.6-lttng.orig/mm/vmalloc.c	2009-10-06 09:37:04.000000000 -0400
> +++ linux-2.6-lttng/mm/vmalloc.c	2009-10-06 09:37:56.000000000 -0400
> @@ -417,13 +417,6 @@ overflow:
>  	return va;
>  }
>  
> -static void rcu_free_va(struct rcu_head *head)
> -{
> -	struct vmap_area *va = container_of(head, struct vmap_area, rcu_head);
> -
> -	kfree(va);
> -}
> -
>  static void __free_vmap_area(struct vmap_area *va)
>  {
>  	BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> @@ -431,7 +424,7 @@ static void __free_vmap_area(struct vmap
>  	RB_CLEAR_NODE(&va->rb_node);
>  	list_del_rcu(&va->list);
>  
> -	call_rcu(&va->rcu_head, rcu_free_va);
> +	kfree(va);
>  }
>  
>  /*
> @@ -757,13 +750,6 @@ static struct vmap_block *new_vmap_block
>  	return vb;
>  }
>  
> -static void rcu_free_vb(struct rcu_head *head)
> -{
> -	struct vmap_block *vb = container_of(head, struct vmap_block, rcu_head);
> -
> -	kfree(vb);
> -}
> -
>  static void free_vmap_block(struct vmap_block *vb)
>  {
>  	struct vmap_block *tmp;
> @@ -778,7 +764,7 @@ static void free_vmap_block(struct vmap_
>  	BUG_ON(tmp != vb);
>  
>  	free_unmap_vmap_area_noflush(vb->va);
> -	call_rcu(&vb->rcu_head, rcu_free_vb);
> +	kfree(vb);
>  }
>  
>  static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/4] tree rcu: Add debug RCU head option
  2009-10-06 14:37 ` [patch 2/4] tree rcu: Add debug RCU head option Mathieu Desnoyers
@ 2009-10-06 15:54   ` Eric Dumazet
  2009-10-06 16:09     ` Mathieu Desnoyers
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2009-10-06 15:54 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

Mathieu Desnoyers a écrit :
> Poisoning the rcu_head callback list. Only for rcu tree for now.
> 
> Helps finding racy users of call_rcu(), which results in hangs because list
> entries are overwritten and/or skipped. Using the lower bit to poison because
> include/net/dst.h __pad_to_align_refcnt complains when struct rcu_head grows.
> 

I see :)

> --- linux-2.6-lttng.orig/include/linux/rcupdate.h	2009-10-06 10:35:15.000000000 -0400
> +++ linux-2.6-lttng/include/linux/rcupdate.h	2009-10-06 10:35:18.000000000 -0400
> @@ -45,6 +45,8 @@
>   * struct rcu_head - callback structure for use with RCU
>   * @next: next update requests in a list
>   * @func: actual update function to call after the grace period.
> + *
> + * Debug mode assumes func pointer value is word-aligned.

Ouch, I guess you never tried CONFIG_CC_OPTIMIZE_FOR_SIZE=y ?

random extract from "nm -v vmlinux"

c0415e58 T in4_pton
c0415f66 T in6_pton
c04161f0 T in_aton
c0416240 T net_ratelimit
c0416250 t linkwatch_add_event
c0416281 t linkwatch_schedule_work
c0416301 T linkwatch_fire_event
c0416368 t __linkwatch_run_queue
c04164d2 t linkwatch_event
c04164f4 T linkwatch_run_queue
c0416500 T sk_chk_filter
c0416703 t sk_filter_rcu_release
c041671d T sk_detach_filter
c041676a T sk_attach_filter
c0416862 T sk_run_filter
c0416f5d T sk_filter
c0416fc4 t __flow_cache_shrink

gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.4.1/configure --enable-languages=c,c++ --prefix=/usr
Thread model: posix
gcc version 4.4.1 (GCC)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/4] tree rcu: Add debug RCU head option
  2009-10-06 15:54   ` Eric Dumazet
@ 2009-10-06 16:09     ` Mathieu Desnoyers
  2009-10-06 16:17       ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 16:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

* Eric Dumazet (eric.dumazet@gmail.com) wrote:
> Mathieu Desnoyers a écrit :
> > Poisoning the rcu_head callback list. Only for rcu tree for now.
> > 
> > Helps finding racy users of call_rcu(), which results in hangs because list
> > entries are overwritten and/or skipped. Using the lower bit to poison because
> > include/net/dst.h __pad_to_align_refcnt complains when struct rcu_head grows.
> > 
> 
> I see :)
> 

I don't know if there is an easy fix for __pad_to_align_refcnt ?

> > --- linux-2.6-lttng.orig/include/linux/rcupdate.h	2009-10-06 10:35:15.000000000 -0400
> > +++ linux-2.6-lttng/include/linux/rcupdate.h	2009-10-06 10:35:18.000000000 -0400
> > @@ -45,6 +45,8 @@
> >   * struct rcu_head - callback structure for use with RCU
> >   * @next: next update requests in a list
> >   * @func: actual update function to call after the grace period.
> > + *
> > + * Debug mode assumes func pointer value is word-aligned.
> 
> Ouch, I guess you never tried CONFIG_CC_OPTIMIZE_FOR_SIZE=y ?
> 

oops. Well, simple fix would be : depends on !CC_OPTIMIZE_FOR_SIZE

It's a debug option after all... but I'd prefer growing the structure
with a supplementary field if possible.

Mathieu

> random extract from "nm -v vmlinux"
> 
> c0415e58 T in4_pton
> c0415f66 T in6_pton
> c04161f0 T in_aton
> c0416240 T net_ratelimit
> c0416250 t linkwatch_add_event
> c0416281 t linkwatch_schedule_work
> c0416301 T linkwatch_fire_event
> c0416368 t __linkwatch_run_queue
> c04164d2 t linkwatch_event
> c04164f4 T linkwatch_run_queue
> c0416500 T sk_chk_filter
> c0416703 t sk_filter_rcu_release
> c041671d T sk_detach_filter
> c041676a T sk_attach_filter
> c0416862 T sk_run_filter
> c0416f5d T sk_filter
> c0416fc4 t __flow_cache_shrink
> 
> gcc -v
> Using built-in specs.
> Target: i686-pc-linux-gnu
> Configured with: ../gcc-4.4.1/configure --enable-languages=c,c++ --prefix=/usr
> Thread model: posix
> gcc version 4.4.1 (GCC)
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 2/4] tree rcu: Add debug RCU head option
  2009-10-06 16:09     ` Mathieu Desnoyers
@ 2009-10-06 16:17       ` Eric Dumazet
  2009-10-06 16:35         ` Mathieu Desnoyers
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2009-10-06 16:17 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

Mathieu Desnoyers a écrit :
> * Eric Dumazet (eric.dumazet@gmail.com) wrote:
>> Mathieu Desnoyers a écrit :
>>> Poisoning the rcu_head callback list. Only for rcu tree for now.
>>>
>>> Helps finding racy users of call_rcu(), which results in hangs because list
>>> entries are overwritten and/or skipped. Using the lower bit to poison because
>>> include/net/dst.h __pad_to_align_refcnt complains when struct rcu_head grows.
>>>
>> I see :)
>>
> 
> I don't know if there is an easy fix for __pad_to_align_refcnt ?


This check was added to make sure some tbench regression was not added if
dst->__refcnt was moved around, I am sure you dont care about tbench being 10% slower,
do you ?


diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..b8fba74 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -154,7 +154,9 @@ static inline void dst_hold(struct dst_entry * dst)
 	 * If your kernel compilation stops here, please check
 	 * __pad_to_align_refcnt declaration in struct dst_entry
 	 */
+#if !defined(DEBUG_RCU_HEAD)
 	BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
+#endif
 	atomic_inc(&dst->__refcnt);
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [patch 2/4] tree rcu: Add debug RCU head option
  2009-10-06 16:17       ` Eric Dumazet
@ 2009-10-06 16:35         ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 16:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney

* Eric Dumazet (eric.dumazet@gmail.com) wrote:
> Mathieu Desnoyers a écrit :
> > * Eric Dumazet (eric.dumazet@gmail.com) wrote:
> >> Mathieu Desnoyers a écrit :
> >>> Poisoning the rcu_head callback list. Only for rcu tree for now.
> >>>
> >>> Helps finding racy users of call_rcu(), which results in hangs because list
> >>> entries are overwritten and/or skipped. Using the lower bit to poison because
> >>> include/net/dst.h __pad_to_align_refcnt complains when struct rcu_head grows.
> >>>
> >> I see :)
> >>
> > 
> > I don't know if there is an easy fix for __pad_to_align_refcnt ?
> 
> 
> This check was added to make sure some tbench regression was not added if
> dst->__refcnt was moved around, I am sure you dont care about tbench being 10% slower,
> do you ?
> 

Indeed, I absolutely don't care when I'm doing debugging :)

Will merge, and add a "struct rcu_head *debug" field in struct rcu_head.

Thanks !

Mathieu

> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 5a900dd..b8fba74 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -154,7 +154,9 @@ static inline void dst_hold(struct dst_entry * dst)
>  	 * If your kernel compilation stops here, please check
>  	 * __pad_to_align_refcnt declaration in struct dst_entry
>  	 */
> +#if !defined(DEBUG_RCU_HEAD)
>  	BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
> +#endif
>  	atomic_inc(&dst->__refcnt);
>  }
>  
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
  2009-10-06 14:37   ` Mathieu Desnoyers
@ 2009-10-06 16:43     ` Christoph Lameter
  -1 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2009-10-06 16:43 UTC (permalink / raw)
  To: nickpiggin, Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney, linux-mm

On Tue, 6 Oct 2009, Mathieu Desnoyers wrote:

> Simplest fix: directly kfree the data structure rather than doing it lazily.

The delay is necessary as far as I can tell for performance reasons. But
Nick was the last one fiddling around with the subsystem as far as I
remember. CCing him. May be he has a minute to think about a fix that
preserved performance.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
@ 2009-10-06 16:43     ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2009-10-06 16:43 UTC (permalink / raw)
  To: nickpiggin, Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Paul E. McKenney, linux-mm

On Tue, 6 Oct 2009, Mathieu Desnoyers wrote:

> Simplest fix: directly kfree the data structure rather than doing it lazily.

The delay is necessary as far as I can tell for performance reasons. But
Nick was the last one fiddling around with the subsystem as far as I
remember. CCing him. May be he has a minute to think about a fix that
preserved performance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
  2009-10-06 16:43     ` Christoph Lameter
@ 2009-10-06 17:33       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 17:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: nickpiggin, akpm, Ingo Molnar, linux-kernel, Paul E. McKenney, linux-mm

* Christoph Lameter (cl@linux-foundation.org) wrote:
> On Tue, 6 Oct 2009, Mathieu Desnoyers wrote:
> 
> > Simplest fix: directly kfree the data structure rather than doing it lazily.
> 
> The delay is necessary as far as I can tell for performance reasons. But
> Nick was the last one fiddling around with the subsystem as far as I
> remember. CCing him. May be he has a minute to think about a fix that
> preserved performance.
> 

Thanks,

More info on the problem:

I added a check in alloc_vmap_area() at:

        va = kmalloc_node(sizeof(struct vmap_area),
                        gfp_mask & GFP_RECLAIM_MASK, node);
        WARN_ON(va->rcu_head.debug == LIST_POISON1);

(memory allocation debugging is off in my config)

I used this along with my new call rcu debug patch to check if memory is
reallocated before the callback (doing kfree) is executed. It triggers
this:

WARNING: at mm/vmalloc.c:343 alloc_vmap_area+0x305/0x340()
Hardware name: X7DAL
Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
Pid: 5584, comm: ltt-disarmall Not tainted 2.6.30.9-trace #41
Call Trace:
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff8023cf29>] ? warn_slowpath_common+0x79/0xd0
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff80237670>] ? default_wake_function+0x0/0x10
 [<ffffffff802b8769>] ? __get_vm_area_node+0xc9/0x1d0
 [<ffffffff805ae965>] ? sys_getsockopt+0x85/0x160
 [<ffffffff8040f160>] ? ltt_vtrace+0x0/0x8d0
 [<ffffffff802b88dd>] ? get_vm_area_caller+0x2d/0x40
 [<ffffffff8068bc3e>] ? arch_imv_update+0x10e/0x2f0
 [<ffffffff802b93aa>] ? vmap+0x4a/0x80
 [<ffffffff8068bc3e>] ? arch_imv_update+0x10e/0x2f0
 [<ffffffff80283e6d>] ? get_tracepoint+0x25d/0x290
 [<ffffffff80280b93>] ? imv_update_range+0x53/0xa0
 [<ffffffff80284821>] ? tracepoint_update_probes+0x21/0x30
 [<ffffffff802848b3>] ? tracepoint_probe_update_all+0x83/0x100
 [<ffffffff80282ac1>] ? marker_update_probes+0x21/0x40
 [<ffffffff8040f160>] ? ltt_vtrace+0x0/0x8d0
 [<ffffffff80282d5d>] ? marker_probe_unregister+0xad/0x130
 [<ffffffff8040b9dc>] ? ltt_marker_disconnect+0xdc/0x120
 [<ffffffff804121d2>] ? marker_enable_write+0x112/0x120
 [<ffffffff80688dd0>] ? _spin_unlock+0x10/0x30
 [<ffffffff802e6b36>] ? mnt_drop_write+0x76/0x180
 [<ffffffff802d96dd>] ? do_filp_open+0x2cd/0xa00
 [<ffffffff804377c1>] ? tty_write+0x221/0x270
 [<ffffffff802cc6cb>] ? vfs_write+0xcb/0x170
 [<ffffffff802cc914>] ? sys_write+0x64/0x130
 [<ffffffff8020beab>] ? system_call_fastpath+0x16/0x1b
---[ end trace 7ef506680d7a9e26 ]---

Could it be that:

static void __free_vmap_area(struct vmap_area *va)
{
        BUG_ON(RB_EMPTY_NODE(&va->rb_node));
        rb_erase(&va->rb_node, &vmap_area_root);
        RB_CLEAR_NODE(&va->rb_node);
        list_del_rcu(&va->list);

        call_rcu(&va->rcu_head, rcu_free_va);
}

(especially clearing from the rb tree and va list)
allows reallocation of the node by kmalloc_node before its reclamation
(only done later by rcu_free_va) ?

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 4/4] vunmap: Fix racy use of rcu_head
@ 2009-10-06 17:33       ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-06 17:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: nickpiggin, akpm, Ingo Molnar, linux-kernel, Paul E. McKenney, linux-mm

* Christoph Lameter (cl@linux-foundation.org) wrote:
> On Tue, 6 Oct 2009, Mathieu Desnoyers wrote:
> 
> > Simplest fix: directly kfree the data structure rather than doing it lazily.
> 
> The delay is necessary as far as I can tell for performance reasons. But
> Nick was the last one fiddling around with the subsystem as far as I
> remember. CCing him. May be he has a minute to think about a fix that
> preserved performance.
> 

Thanks,

More info on the problem:

I added a check in alloc_vmap_area() at:

        va = kmalloc_node(sizeof(struct vmap_area),
                        gfp_mask & GFP_RECLAIM_MASK, node);
        WARN_ON(va->rcu_head.debug == LIST_POISON1);

(memory allocation debugging is off in my config)

I used this along with my new call rcu debug patch to check if memory is
reallocated before the callback (doing kfree) is executed. It triggers
this:

WARNING: at mm/vmalloc.c:343 alloc_vmap_area+0x305/0x340()
Hardware name: X7DAL
Modules linked in: loop ltt_statedump ltt_userspace_event ipc_tr]
Pid: 5584, comm: ltt-disarmall Not tainted 2.6.30.9-trace #41
Call Trace:
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff8023cf29>] ? warn_slowpath_common+0x79/0xd0
 [<ffffffff802b8665>] ? alloc_vmap_area+0x305/0x340
 [<ffffffff80237670>] ? default_wake_function+0x0/0x10
 [<ffffffff802b8769>] ? __get_vm_area_node+0xc9/0x1d0
 [<ffffffff805ae965>] ? sys_getsockopt+0x85/0x160
 [<ffffffff8040f160>] ? ltt_vtrace+0x0/0x8d0
 [<ffffffff802b88dd>] ? get_vm_area_caller+0x2d/0x40
 [<ffffffff8068bc3e>] ? arch_imv_update+0x10e/0x2f0
 [<ffffffff802b93aa>] ? vmap+0x4a/0x80
 [<ffffffff8068bc3e>] ? arch_imv_update+0x10e/0x2f0
 [<ffffffff80283e6d>] ? get_tracepoint+0x25d/0x290
 [<ffffffff80280b93>] ? imv_update_range+0x53/0xa0
 [<ffffffff80284821>] ? tracepoint_update_probes+0x21/0x30
 [<ffffffff802848b3>] ? tracepoint_probe_update_all+0x83/0x100
 [<ffffffff80282ac1>] ? marker_update_probes+0x21/0x40
 [<ffffffff8040f160>] ? ltt_vtrace+0x0/0x8d0
 [<ffffffff80282d5d>] ? marker_probe_unregister+0xad/0x130
 [<ffffffff8040b9dc>] ? ltt_marker_disconnect+0xdc/0x120
 [<ffffffff804121d2>] ? marker_enable_write+0x112/0x120
 [<ffffffff80688dd0>] ? _spin_unlock+0x10/0x30
 [<ffffffff802e6b36>] ? mnt_drop_write+0x76/0x180
 [<ffffffff802d96dd>] ? do_filp_open+0x2cd/0xa00
 [<ffffffff804377c1>] ? tty_write+0x221/0x270
 [<ffffffff802cc6cb>] ? vfs_write+0xcb/0x170
 [<ffffffff802cc914>] ? sys_write+0x64/0x130
 [<ffffffff8020beab>] ? system_call_fastpath+0x16/0x1b
---[ end trace 7ef506680d7a9e26 ]---

Could it be that:

static void __free_vmap_area(struct vmap_area *va)
{
        BUG_ON(RB_EMPTY_NODE(&va->rb_node));
        rb_erase(&va->rb_node, &vmap_area_root);
        RB_CLEAR_NODE(&va->rb_node);
        list_del_rcu(&va->list);

        call_rcu(&va->rcu_head, rcu_free_va);
}

(especially clearing from the rb tree and va list)
allows reallocation of the node by kmalloc_node before its reclamation
(only done later by rcu_free_va) ?

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2009-10-06 14:37   ` Mathieu Desnoyers
@ 2009-10-07  4:20 ` Paul E. McKenney
  2009-10-07 12:36   ` Mathieu Desnoyers
  2009-10-12 18:07 ` Ingo Molnar
  5 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2009-10-07  4:20 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel

On Tue, Oct 06, 2009 at 10:37:28AM -0400, Mathieu Desnoyers wrote:
> Here is a patchset, done on 2.6.30.9, which permits to detect and fix racy
> call_rcu() users.
> 
> Fix for vunmap, which happens to be one of them, is included.
> 
> Note that some false-positives might come up when call_rcu() is called on
> uninitialized struct rcu_head. The solution is, surprisingly enough, to
> initialize them.

Very cool, Mathieu!!!

I do like adding the debug-only field to struct rcu_head.  Lots of
additional places need initialization, or am I missing a trick here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-07  4:20 ` [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Paul E. McKenney
@ 2009-10-07 12:36   ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-07 12:36 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: akpm, Ingo Molnar, linux-kernel

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Tue, Oct 06, 2009 at 10:37:28AM -0400, Mathieu Desnoyers wrote:
> > Here is a patchset, done on 2.6.30.9, which permits to detect and fix racy
> > call_rcu() users.
> > 
> > Fix for vunmap, which happens to be one of them, is included.
> > 
> > Note that some false-positives might come up when call_rcu() is called on
> > uninitialized struct rcu_head. The solution is, surprisingly enough, to
> > initialize them.
> 
> Very cool, Mathieu!!!
> 
> I do like adding the debug-only field to struct rcu_head.  Lots of
> additional places need initialization, or am I missing a trick here?
> 

In my case, only 2 sites in the process management and 1 site in the now
deprecated marker.c complained about not being initialized.

I can guess we will find others though. Please see the v2 of my patch,
which adds the supplementary "debug" field rather than depending on
word-aligned pointers. I volountarily complain about the .debug pointer
being non-NULL when we find an uninitialized site. I could be more
specific and complain "differently" if the pointer happens to be random
vs if it's LIST_POISON1.

Thanks,

Mathieu

> 							Thanx, Paul

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2009-10-07  4:20 ` [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Paul E. McKenney
@ 2009-10-12 18:07 ` Ingo Molnar
  2009-10-12 18:30   ` Mathieu Desnoyers
  5 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-10-12 18:07 UTC (permalink / raw)
  To: Mathieu Desnoyers, Thomas Gleixner; +Cc: akpm, linux-kernel, Paul E. McKenney


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> Here is a patchset, done on 2.6.30.9, which permits to detect and fix 
> racy call_rcu() users.

Nice idea - but any reason why this isnt using the debugojects 
framework? That debugging framework is upstream already and allows this 
kind of object lifetime debugging - in a much broader way.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-12 18:07 ` Ingo Molnar
@ 2009-10-12 18:30   ` Mathieu Desnoyers
  2009-10-12 18:37     ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2009-10-12 18:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, akpm, linux-kernel, Paul E. McKenney

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > Here is a patchset, done on 2.6.30.9, which permits to detect and fix 
> > racy call_rcu() users.
> 
> Nice idea - but any reason why this isnt using the debugojects 
> framework? That debugging framework is upstream already and allows this 
> kind of object lifetime debugging - in a much broader way.
> 

Other than: it did not occur to me, no, there is no reason for not using
debugobjects there. :-)

As long as we can tag the object as
- initialized
- active
- inactive

And dump a nice fat warning when the object is activated twice, that
should suffice. Now time is a bit short on my side, but I'll keep in
mind to respin a version on top of debugobjects soon.

Thanks,

Mathieu


> 	Ingo

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-12 18:30   ` Mathieu Desnoyers
@ 2009-10-12 18:37     ` Ingo Molnar
  2009-10-12 18:56       ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-10-12 18:37 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Thomas Gleixner, akpm, linux-kernel, Paul E. McKenney


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> > > Here is a patchset, done on 2.6.30.9, which permits to detect and fix 
> > > racy call_rcu() users.
> > 
> > Nice idea - but any reason why this isnt using the debugojects 
> > framework? That debugging framework is upstream already and allows this 
> > kind of object lifetime debugging - in a much broader way.
> > 
> 
> Other than: it did not occur to me, no, there is no reason for not using
> debugobjects there. :-)
> 
> As long as we can tag the object as
> - initialized
> - active
> - inactive
> 
> And dump a nice fat warning when the object is activated twice, that
> should suffice. Now time is a bit short on my side, but I'll keep in
> mind to respin a version on top of debugobjects soon.

ok, thanks.

Commit to check would be:

  237fc6e: add hrtimer specific debugobjects code
  c6f3a97: debugobjects: add timer specific object debugging code

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users
  2009-10-12 18:37     ` Ingo Molnar
@ 2009-10-12 18:56       ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2009-10-12 18:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mathieu Desnoyers, akpm, linux-kernel, Paul E. McKenney

On Mon, 12 Oct 2009, Ingo Molnar wrote:
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > * Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > 
> > > > Here is a patchset, done on 2.6.30.9, which permits to detect and fix 
> > > > racy call_rcu() users.
> > > 
> > > Nice idea - but any reason why this isnt using the debugojects 
> > > framework? That debugging framework is upstream already and allows this 
> > > kind of object lifetime debugging - in a much broader way.
> > > 
> > 
> > Other than: it did not occur to me, no, there is no reason for not using
> > debugobjects there. :-)
> > 
> > As long as we can tag the object as
> > - initialized
> > - active
> > - inactive
> > 
> > And dump a nice fat warning when the object is activated twice, that
> > should suffice. Now time is a bit short on my side, but I'll keep in
> > mind to respin a version on top of debugobjects soon.

It does, and it also provides you a mechanism to prevent
damage. i.e. you can provide a callback function which can deactivate
an active object before activating it again. That turned out to be
useful for timer wreckage which happened usually way after the
offending code corrupted lists etc.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-10-12 18:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-06 14:37 [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Mathieu Desnoyers
2009-10-06 14:37 ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
2009-10-06 15:19   ` [patch 1/4] kernel call_rcu usage: initialize rcu_head structures (v2) Mathieu Desnoyers
2009-10-06 14:37 ` [patch 2/4] tree rcu: Add debug RCU head option Mathieu Desnoyers
2009-10-06 15:54   ` Eric Dumazet
2009-10-06 16:09     ` Mathieu Desnoyers
2009-10-06 16:17       ` Eric Dumazet
2009-10-06 16:35         ` Mathieu Desnoyers
2009-10-06 14:37 ` [patch 3/4] markers call_rcu usage: initialize rcu_head structures Mathieu Desnoyers
2009-10-06 14:37 ` [patch 4/4] vunmap: Fix racy use of rcu_head Mathieu Desnoyers
2009-10-06 14:37   ` Mathieu Desnoyers
2009-10-06 15:23   ` Mathieu Desnoyers
2009-10-06 15:23     ` Mathieu Desnoyers
2009-10-06 16:43   ` Christoph Lameter
2009-10-06 16:43     ` Christoph Lameter
2009-10-06 17:33     ` Mathieu Desnoyers
2009-10-06 17:33       ` Mathieu Desnoyers
2009-10-07  4:20 ` [patch 0/4] DEBUG_RCU_HEAD: Debug and fix racy call_rcu() users Paul E. McKenney
2009-10-07 12:36   ` Mathieu Desnoyers
2009-10-12 18:07 ` Ingo Molnar
2009-10-12 18:30   ` Mathieu Desnoyers
2009-10-12 18:37     ` Ingo Molnar
2009-10-12 18:56       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.