call_rcu from trace_preempt

* call_rcu from trace_preempt
@ 2015-06-15 22:24 Alexei Starovoitov
  2015-06-15 23:07 ` Paul E. McKenney
  0 siblings, 1 reply; 41+ messages in thread
From: Alexei Starovoitov @ 2015-06-15 22:24 UTC (permalink / raw)
  To: paulmck, Daniel Wagner, LKML

[-- Attachment #1: Type: text/plain, Size: 5427 bytes --]

Hi Paul,

I've been debugging the issue reported by Daniel:
http://thread.gmane.org/gmane.linux.kernel/1974304/focus=1974304
and it seems I narrowed it down to recursive call_rcu.
 From trace_preempt_on() I'm doing:
e = kmalloc(sizeof(struct elem), GFP_ATOMIC)
kfree_rcu(e, rcu)

which causing all sorts of corruptions like:
[    2.074175] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263 
debug_print_object+0x8c/0xb0()
[    2.075567] ODEBUG: active_state not available (active state 0) 
object type: rcu_head hint:           (null)

[    2.102141] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263 
debug_print_object+0x8c/0xb0()
[    2.103547] ODEBUG: deactivate not available (active state 0) object 
type: rcu_head hint:           (null)

[    2.253995] WARNING: CPU: 0 PID: 7 at ../kernel/rcu/tree.c:2976 
__call_rcu.constprop.67+0x1e5/0x350()
[    2.255510] __call_rcu(): Leaked duplicate callback

Sometimes stack looks like:
[    2.145163] WARNING: CPU: 0 PID: 102 at ../lib/debugobjects.c:263 
debug_print_object+0x8c/0xb0()
[    2.147465] ODEBUG: active_state not available (active state 0) 
object type: rcu_head hint:           (null)
[    2.148022] Modules linked in:
[    2.148022] CPU: 0 PID: 102 Comm: systemd-udevd Not tainted 
4.1.0-rc7+ #653
[    2.148022] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org 
04/01/2014
[    2.148022]  ffffffff81a34f77 ffff88000fc03d18 ffffffff81781ed4 
0000000000000105
[    2.148022]  ffff88000fc03d68 ffff88000fc03d58 ffffffff81064e57 
0000000000000000
[    2.148022]  ffff88000fc03e20 ffffffff81c50f00 ffffffff81a34fdf 
0000000000000286
[    2.148022] Call Trace:
[    2.148022]  <IRQ>  [<ffffffff81781ed4>] dump_stack+0x4f/0x7b
[    2.148022]  [<ffffffff81064e57>] warn_slowpath_common+0x97/0xe0
[    2.148022]  [<ffffffff81064f56>] warn_slowpath_fmt+0x46/0x50
[    2.148022]  [<ffffffff813d305c>] debug_print_object+0x8c/0xb0
[    2.148022]  [<ffffffff813d3cf6>] ? debug_object_active_state+0x66/0x160
[    2.148022]  [<ffffffff813d3d81>] debug_object_active_state+0xf1/0x160
[    2.148022]  [<ffffffff810e33b1>] rcu_process_callbacks+0x301/0xae0
[    2.148022]  [<ffffffff810e3397>] ? rcu_process_callbacks+0x2e7/0xae0
[    2.148022]  [<ffffffff810e9f28>] ? run_timer_softirq+0x218/0x4c0
[    2.148022]  [<ffffffff81069dff>] __do_softirq+0x14f/0x670
[    2.148022]  [<ffffffff8106a5f5>] irq_exit+0xa5/0xb0
[    2.148022]  [<ffffffff8178ea6a>] smp_apic_timer_interrupt+0x4a/0x60
[    2.148022]  [<ffffffff8178cea0>] apic_timer_interrupt+0x70/0x80
[    2.148022]  <EOI>  [<ffffffff813d363c>] ? 
debug_object_activate+0x9c/0x1e0
[    2.148022]  [<ffffffff8178bd47>] ? _raw_spin_unlock_irqrestore+0x67/0x80
[    2.148022]  [<ffffffff813d36f6>] debug_object_activate+0x156/0x1e0
[    2.148022]  [<ffffffff810de037>] rcuhead_fixup_activate+0x37/0x40
[    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
[    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
[    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
[    2.148022]  [<ffffffff813d3524>] ? __debug_object_init+0x3f4/0x430
[    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
[    2.148022]  [<ffffffff810e308a>] kfree_call_rcu+0x1a/0x20
[    2.148022]  [<ffffffff8115e230>] trace_preempt_on+0x180/0x290
[    2.148022]  [<ffffffff8115e17e>] ? trace_preempt_on+0xce/0x290
[    2.148022]  [<ffffffff810991c3>] preempt_count_sub+0x73/0xf0
[    2.148022]  [<ffffffff8178bd2b>] _raw_spin_unlock_irqrestore+0x4b/0x80
[    2.148022]  [<ffffffff813d3524>] __debug_object_init+0x3f4/0x430
[    2.148022]  [<ffffffff8115e23c>] ? trace_preempt_on+0x18c/0x290
[    2.148022]  [<ffffffff813d357b>] debug_object_init+0x1b/0x20
[    2.148022]  [<ffffffff810de028>] rcuhead_fixup_activate+0x28/0x40
[    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
[    2.148022]  [<ffffffff812036b0>] ? get_max_files+0x20/0x20
[    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
[    2.148022]  [<ffffffff810e30a7>] call_rcu+0x17/0x20
[    2.148022]  [<ffffffff81203903>] __fput+0x183/0x200
[    2.148022]  [<ffffffff812039ce>] ____fput+0xe/0x10
[    2.148022]  [<ffffffff81088dd5>] task_work_run+0xb5/0xe0
[    2.148022]  [<ffffffff81002c74>] do_notify_resume+0x64/0x80
[    2.148022]  [<ffffffff8178c26c>] int_signal+0x12/0x17

My reading of the code is debug_object_*() bits are reporting real
problem. In the above trace the call
debug_rcu_head_unqueue(list);
from rcu_do_batch() is not finding 'list' in tracked objects.

I know that doing call_rcu() from trace_preempt is ill advised,
but I still want to understand why call_rcu corrupts the memory.

Attaching a patch that I'm using for debugging.
It's doing recursion preemption check, so number of nested call_rcu
is no more than 2.
Also if I replace kfree_rcu is this patch with a regular kfree,
all works fine.

I'm seeing this crashes in VM with _single_ cpu.
Kernel is built with CONFIG_PREEMPT, CONFIG_PREEMPT_TRACER and
CONFIG_DEBUG_OBJECTS_RCU_HEAD.

Also interesting that size of
struct elem {
        u64 pad[32];
        struct rcu_head rcu;
};
that I'm using in kmalloc/kfree_rcu changes the crash.
If padding is zero, kernel just locksup, if pad[1] I see
one type of odebug warnings, if pad[32] - another.

Any advise on where to look is greatly appreciated.

Thanks!


[-- Attachment #2: trace.patch --]
[-- Type: text/plain, Size: 1209 bytes --]

diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 8523ea345f2b..89433a83dd2d 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -13,6 +13,7 @@
 #include <linux/uaccess.h>
 #include <linux/module.h>
 #include <linux/ftrace.h>
+#include <linux/slab.h>
 
 #include "trace.h"
 
@@ -510,8 +511,42 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
 #endif /*  CONFIG_IRQSOFF_TRACER */
 
 #ifdef CONFIG_PREEMPT_TRACER
+struct elem {
+	u64 pad[32];
+	struct rcu_head rcu;
+};
+
+static DEFINE_PER_CPU(int, prog_active);
+static void * test_alloc(void)
+{
+	struct elem *e = NULL;
+
+	if (in_nmi())
+		return e;
+
+	preempt_disable_notrace();
+	if (unlikely(__this_cpu_inc_return(prog_active) != 1))
+		goto out;
+
+	rcu_read_lock();
+	e = kmalloc(sizeof(struct elem), GFP_ATOMIC);
+	rcu_read_unlock();
+	if (!e)
+		goto out;
+
+	kfree_rcu(e, rcu);
+out:
+	__this_cpu_dec(prog_active);
+	preempt_enable_no_resched_notrace();
+	return e;
+}
+
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
+	void * buf = 0;
+	static int cnt = 0;
+	if (cnt++ > 3000000)
+		buf = test_alloc();
 	if (preempt_trace() && !irq_trace())
 		stop_critical_timing(a0, a1);
 }

^ permalink raw reply related	[flat|nested] 41+ messages in thread