* New RCU related warning due to rcu_preempt_depth() changes @ 2012-04-17 8:42 Sasha Levin 2012-04-17 15:05 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Sasha Levin @ 2012-04-17 8:42 UTC (permalink / raw) To: Paul McKenney; +Cc: Dave Jones, linux-kernel@vger.kernel.org List Hi Paul, It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and __rcu_read_unlock() to per-CPU variables") is causing the following warning (I've added the extra fields on the second line): [ 77.330920] BUG: sleeping function called from invalid context at mm/memory.c:3933 [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity [ 77.344135] no locks held by trinity/5669. [ 77.349644] Pid: 5669, comm: trinity Tainted: G W 3.4.0-rc3-next-20120417-sasha-dirty #83 [ 77.354401] Call Trace: [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 As you can see, rcu_preempt_depth() returns 1 when running in that context, which looks pretty odd. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-17 8:42 New RCU related warning due to rcu_preempt_depth() changes Sasha Levin @ 2012-04-17 15:05 ` Paul E. McKenney 2012-04-17 15:36 ` Sasha Levin 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2012-04-17 15:05 UTC (permalink / raw) To: Sasha Levin; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: > Hi Paul, > > It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and > __rcu_read_unlock() to per-CPU variables") is causing the following > warning (I've added the extra fields on the second line): > > [ 77.330920] BUG: sleeping function called from invalid context at > mm/memory.c:3933 > [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, > preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity > [ 77.344135] no locks held by trinity/5669. > [ 77.349644] Pid: 5669, comm: trinity Tainted: G W > 3.4.0-rc3-next-20120417-sasha-dirty #83 > [ 77.354401] Call Trace: > [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 > [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 > [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 > > As you can see, rcu_preempt_depth() returns 1 when running in that > context, which looks pretty odd. Ouch!!! So it looks like I missed a place where I need to save and restore the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special variables. My (probably hopelessly naive) guess is that I need to add a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to make rcu_switch_from() take the task_struct pointer as an argument, passing in prev. Does this make sense, or am I still missing something here? Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-17 15:05 ` Paul E. McKenney @ 2012-04-17 15:36 ` Sasha Levin 2012-04-17 15:53 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Sasha Levin @ 2012-04-17 15:36 UTC (permalink / raw) To: paulmck; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Tue, Apr 17, 2012 at 5:05 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: >> Hi Paul, >> >> It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and >> __rcu_read_unlock() to per-CPU variables") is causing the following >> warning (I've added the extra fields on the second line): >> >> [ 77.330920] BUG: sleeping function called from invalid context at >> mm/memory.c:3933 >> [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, >> preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity >> [ 77.344135] no locks held by trinity/5669. >> [ 77.349644] Pid: 5669, comm: trinity Tainted: G W >> 3.4.0-rc3-next-20120417-sasha-dirty #83 >> [ 77.354401] Call Trace: >> [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 >> [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 >> [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 >> [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 >> >> As you can see, rcu_preempt_depth() returns 1 when running in that >> context, which looks pretty odd. > > Ouch!!! > > So it looks like I missed a place where I need to save and restore > the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special > variables. My (probably hopelessly naive) guess is that I need to add > a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to > make rcu_switch_from() take the task_struct pointer as an argument, > passing in prev. > > Does this make sense, or am I still missing something here? I've let the test run for a bit more, and it appears that I'm getting this warning from lots of different sources, would this schedule_tail() fix all of them? Here's several traces for reference: [ 223.068875] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 [ 223.070719] [<ffffffff810b2a05>] close_files+0x1d5/0x220 [ 223.072531] [<ffffffff810b2830>] ? find_new_reaper+0x230/0x230 [ 223.076325] [<ffffffff810b4811>] put_files_struct+0x21/0x1b0 [ 223.080649] [<ffffffff8269eb20>] ? _raw_spin_unlock+0x30/0x60 [ 223.084455] [<ffffffff810b4a5d>] exit_files+0x4d/0x60 [ 223.087967] [<ffffffff810b51ac>] do_exit+0x28c/0x470 [ 223.091369] [<ffffffff810e44d1>] ? get_parent_ip+0x11/0x50 [ 223.093190] [<ffffffff810b5473>] do_group_exit+0xa3/0xe0 [ 223.095061] [<ffffffff810c4759>] get_signal_to_deliver+0x389/0x400 [ 223.098400] [<ffffffff8104bce2>] do_signal+0x42/0x120 [ 223.100222] [<ffffffff8104c4d7>] ? do_divide_error+0xa7/0xb0 [ 223.102267] [<ffffffff8269fa7f>] ? retint_signal+0x11/0x92 [ 223.104145] [<ffffffff8104be34>] do_notify_resume+0x54/0xa0 [ 223.106033] [<ffffffff8269fabb>] retint_signal+0x4d/0x92 [ 176.217632] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 [ 176.223583] [<ffffffff81081b83>] do_page_fault+0x243/0x4f0 [ 176.229932] [<ffffffff811151ca>] ? __lock_release+0x1ba/0x1d0 [ 176.233651] [<ffffffff8269ec1b>] ? _raw_spin_unlock_irq+0x2b/0x80 [ 176.239389] [<ffffffff810e44d1>] ? get_parent_ip+0x11/0x50 [ 176.242507] [<ffffffff810e469e>] ? sub_preempt_count+0xae/0xe0 [ 176.248795] [<ffffffff8269ec41>] ? _raw_spin_unlock_irq+0x51/0x80 [ 176.255454] [<ffffffff81079e51>] do_async_page_fault+0x31/0xa0 [ 176.260342] [<ffffffff8269fcd5>] async_page_fault+0x25/0x30 [ 173.587864] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 [ 173.593134] [<ffffffff811f5542>] ? __d_alloc+0x32/0x1a0 [ 173.603730] [<ffffffff811c0f8d>] kmem_cache_alloc+0x4d/0x160 [ 173.604746] [<ffffffff811f5e40>] ? __d_lookup_rcu+0x240/0x240 [ 173.608932] [<ffffffff811f5542>] __d_alloc+0x32/0x1a0 [ 173.612444] [<ffffffff811f56f3>] d_alloc+0x23/0x80 [ 173.616887] [<ffffffff811e773b>] __lookup_hash+0x9b/0x110 [ 173.621488] [<ffffffff811e77c4>] lookup_hash+0x14/0x20 [ 173.624395] [<ffffffff811ecc99>] do_unlinkat+0x79/0x1e0 [ 173.626483] [<ffffffff8269ec41>] ? _raw_spin_unlock_irq+0x51/0x80 [ 173.632242] [<ffffffff826a02e9>] ? sysret_check+0x22/0x5d [ 173.637884] [<ffffffff8186db5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 173.642176] [<ffffffff811ece51>] sys_unlink+0x11/0x20 [ 173.645320] [<ffffffff826a02bd>] system_call_fastpath+0x1a/0x1f ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-17 15:36 ` Sasha Levin @ 2012-04-17 15:53 ` Paul E. McKenney 2012-04-17 16:45 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2012-04-17 15:53 UTC (permalink / raw) To: Sasha Levin; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Tue, Apr 17, 2012 at 05:36:59PM +0200, Sasha Levin wrote: > On Tue, Apr 17, 2012 at 5:05 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: > >> Hi Paul, > >> > >> It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and > >> __rcu_read_unlock() to per-CPU variables") is causing the following > >> warning (I've added the extra fields on the second line): > >> > >> [ 77.330920] BUG: sleeping function called from invalid context at > >> mm/memory.c:3933 > >> [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, > >> preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity > >> [ 77.344135] no locks held by trinity/5669. > >> [ 77.349644] Pid: 5669, comm: trinity Tainted: G W > >> 3.4.0-rc3-next-20120417-sasha-dirty #83 > >> [ 77.354401] Call Trace: > >> [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > >> [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 > >> [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 > >> [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 > >> > >> As you can see, rcu_preempt_depth() returns 1 when running in that > >> context, which looks pretty odd. > > > > Ouch!!! > > > > So it looks like I missed a place where I need to save and restore > > the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special > > variables. My (probably hopelessly naive) guess is that I need to add > > a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to > > make rcu_switch_from() take the task_struct pointer as an argument, > > passing in prev. > > > > Does this make sense, or am I still missing something here? > > I've let the test run for a bit more, and it appears that I'm getting > this warning from lots of different sources, would this > schedule_tail() fix all of them? If I understand the failure correctly, yes. If the task switches without RCU paying attention, the nesting count for both the outgoing and the incoming tasks can get messed up. The messed-up counts could easily cause problems downstream. Of course, there might well be additional bugs. I will put a speculative patch together and send it along. Thanx, Paul > Here's several traces for reference: > > [ 223.068875] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > [ 223.070719] [<ffffffff810b2a05>] close_files+0x1d5/0x220 > [ 223.072531] [<ffffffff810b2830>] ? find_new_reaper+0x230/0x230 > [ 223.076325] [<ffffffff810b4811>] put_files_struct+0x21/0x1b0 > [ 223.080649] [<ffffffff8269eb20>] ? _raw_spin_unlock+0x30/0x60 > [ 223.084455] [<ffffffff810b4a5d>] exit_files+0x4d/0x60 > [ 223.087967] [<ffffffff810b51ac>] do_exit+0x28c/0x470 > [ 223.091369] [<ffffffff810e44d1>] ? get_parent_ip+0x11/0x50 > [ 223.093190] [<ffffffff810b5473>] do_group_exit+0xa3/0xe0 > [ 223.095061] [<ffffffff810c4759>] get_signal_to_deliver+0x389/0x400 > [ 223.098400] [<ffffffff8104bce2>] do_signal+0x42/0x120 > [ 223.100222] [<ffffffff8104c4d7>] ? do_divide_error+0xa7/0xb0 > [ 223.102267] [<ffffffff8269fa7f>] ? retint_signal+0x11/0x92 > [ 223.104145] [<ffffffff8104be34>] do_notify_resume+0x54/0xa0 > [ 223.106033] [<ffffffff8269fabb>] retint_signal+0x4d/0x92 > > [ 176.217632] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > [ 176.223583] [<ffffffff81081b83>] do_page_fault+0x243/0x4f0 > [ 176.229932] [<ffffffff811151ca>] ? __lock_release+0x1ba/0x1d0 > [ 176.233651] [<ffffffff8269ec1b>] ? _raw_spin_unlock_irq+0x2b/0x80 > [ 176.239389] [<ffffffff810e44d1>] ? get_parent_ip+0x11/0x50 > [ 176.242507] [<ffffffff810e469e>] ? sub_preempt_count+0xae/0xe0 > [ 176.248795] [<ffffffff8269ec41>] ? _raw_spin_unlock_irq+0x51/0x80 > [ 176.255454] [<ffffffff81079e51>] do_async_page_fault+0x31/0xa0 > [ 176.260342] [<ffffffff8269fcd5>] async_page_fault+0x25/0x30 > > [ 173.587864] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > [ 173.593134] [<ffffffff811f5542>] ? __d_alloc+0x32/0x1a0 > [ 173.603730] [<ffffffff811c0f8d>] kmem_cache_alloc+0x4d/0x160 > [ 173.604746] [<ffffffff811f5e40>] ? __d_lookup_rcu+0x240/0x240 > [ 173.608932] [<ffffffff811f5542>] __d_alloc+0x32/0x1a0 > [ 173.612444] [<ffffffff811f56f3>] d_alloc+0x23/0x80 > [ 173.616887] [<ffffffff811e773b>] __lookup_hash+0x9b/0x110 > [ 173.621488] [<ffffffff811e77c4>] lookup_hash+0x14/0x20 > [ 173.624395] [<ffffffff811ecc99>] do_unlinkat+0x79/0x1e0 > [ 173.626483] [<ffffffff8269ec41>] ? _raw_spin_unlock_irq+0x51/0x80 > [ 173.632242] [<ffffffff826a02e9>] ? sysret_check+0x22/0x5d > [ 173.637884] [<ffffffff8186db5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 173.642176] [<ffffffff811ece51>] sys_unlink+0x11/0x20 > [ 173.645320] [<ffffffff826a02bd>] system_call_fastpath+0x1a/0x1f > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-17 15:53 ` Paul E. McKenney @ 2012-04-17 16:45 ` Paul E. McKenney 2012-04-18 5:29 ` Sasha Levin 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2012-04-17 16:45 UTC (permalink / raw) To: Sasha Levin; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Tue, Apr 17, 2012 at 08:53:16AM -0700, Paul E. McKenney wrote: > On Tue, Apr 17, 2012 at 05:36:59PM +0200, Sasha Levin wrote: > > On Tue, Apr 17, 2012 at 5:05 PM, Paul E. McKenney > > <paulmck@linux.vnet.ibm.com> wrote: > > > On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: > > >> Hi Paul, > > >> > > >> It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and > > >> __rcu_read_unlock() to per-CPU variables") is causing the following > > >> warning (I've added the extra fields on the second line): > > >> > > >> [ 77.330920] BUG: sleeping function called from invalid context at > > >> mm/memory.c:3933 > > >> [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, > > >> preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity > > >> [ 77.344135] no locks held by trinity/5669. > > >> [ 77.349644] Pid: 5669, comm: trinity Tainted: G W > > >> 3.4.0-rc3-next-20120417-sasha-dirty #83 > > >> [ 77.354401] Call Trace: > > >> [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > > >> [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 > > >> [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 > > >> [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 > > >> > > >> As you can see, rcu_preempt_depth() returns 1 when running in that > > >> context, which looks pretty odd. > > > > > > Ouch!!! > > > > > > So it looks like I missed a place where I need to save and restore > > > the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special > > > variables. My (probably hopelessly naive) guess is that I need to add > > > a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to > > > make rcu_switch_from() take the task_struct pointer as an argument, > > > passing in prev. > > > > > > Does this make sense, or am I still missing something here? > > > > I've let the test run for a bit more, and it appears that I'm getting > > this warning from lots of different sources, would this > > schedule_tail() fix all of them? > > If I understand the failure correctly, yes. If the task switches without > RCU paying attention, the nesting count for both the outgoing and the > incoming tasks can get messed up. The messed-up counts could easily > cause problems downstream. > > Of course, there might well be additional bugs. > > I will put a speculative patch together and send it along. And here it is, testing just started. Thanx, Paul ------------------------------------------------------------------------ rcu: Add RCU context switching to schedule_tail() The new rcu_read_lock_nesting and rcu_read_unlock_special per-CPU variables must be saved and restored at every context switch, including those involving schedule_tail(). This commit therefore adds the saving and restoring to schedul_tail(). Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c index 25a7fea..32272d4 100644 --- a/arch/um/drivers/mconsole_kern.c +++ b/arch/um/drivers/mconsole_kern.c @@ -704,7 +704,7 @@ static void stack_proc(void *arg) struct task_struct *from = current, *to = arg; to->thread.saved_task = from; - rcu_switch_from(); + rcu_switch_from(from); switch_to(from, to, from); rcu_switch_to(); } diff --git a/include/linux/sched.h b/include/linux/sched.h index f2468cb..0d48609 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1937,10 +1937,8 @@ static inline void rcu_copy_process(struct task_struct *p) * * The caller must have disabled preemption. */ -static inline void rcu_switch_from(void) +static inline void rcu_switch_from(struct task_struct *t) { - struct task_struct *t = current; - if (__this_cpu_read(rcu_read_lock_nesting) != 0) rcu_preempt_note_context_switch(); t->rcu_read_lock_nesting_save = __this_cpu_read(rcu_read_lock_nesting); @@ -1991,7 +1989,7 @@ static inline void rcu_copy_process(struct task_struct *p) { } -static inline void rcu_switch_from(void) +static inline void rcu_switch_from(struct task_struct *t) { } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 51ce537..17ae267 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2024,6 +2024,8 @@ asmlinkage void schedule_tail(struct task_struct *prev) { struct rq *rq = this_rq(); + rcu_switch_from(prev); + rcu_switch_to(); finish_task_switch(rq, prev); /* @@ -2083,7 +2085,7 @@ context_switch(struct rq *rq, struct task_struct *prev, #endif /* Here we just switch the register state and the stack. */ - rcu_switch_from(); + rcu_switch_from(current); switch_to(prev, next, prev); rcu_switch_to(); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-17 16:45 ` Paul E. McKenney @ 2012-04-18 5:29 ` Sasha Levin 2012-04-18 14:11 ` Paul E. McKenney 0 siblings, 1 reply; 7+ messages in thread From: Sasha Levin @ 2012-04-18 5:29 UTC (permalink / raw) To: paulmck; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Tue, Apr 17, 2012 at 6:45 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Apr 17, 2012 at 08:53:16AM -0700, Paul E. McKenney wrote: >> On Tue, Apr 17, 2012 at 05:36:59PM +0200, Sasha Levin wrote: >> > On Tue, Apr 17, 2012 at 5:05 PM, Paul E. McKenney >> > <paulmck@linux.vnet.ibm.com> wrote: >> > > On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: >> > >> Hi Paul, >> > >> >> > >> It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and >> > >> __rcu_read_unlock() to per-CPU variables") is causing the following >> > >> warning (I've added the extra fields on the second line): >> > >> >> > >> [ 77.330920] BUG: sleeping function called from invalid context at >> > >> mm/memory.c:3933 >> > >> [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, >> > >> preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity >> > >> [ 77.344135] no locks held by trinity/5669. >> > >> [ 77.349644] Pid: 5669, comm: trinity Tainted: G W >> > >> 3.4.0-rc3-next-20120417-sasha-dirty #83 >> > >> [ 77.354401] Call Trace: >> > >> [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 >> > >> [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 >> > >> [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 >> > >> [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 >> > >> >> > >> As you can see, rcu_preempt_depth() returns 1 when running in that >> > >> context, which looks pretty odd. >> > > >> > > Ouch!!! >> > > >> > > So it looks like I missed a place where I need to save and restore >> > > the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special >> > > variables. My (probably hopelessly naive) guess is that I need to add >> > > a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to >> > > make rcu_switch_from() take the task_struct pointer as an argument, >> > > passing in prev. >> > > >> > > Does this make sense, or am I still missing something here? >> > >> > I've let the test run for a bit more, and it appears that I'm getting >> > this warning from lots of different sources, would this >> > schedule_tail() fix all of them? >> >> If I understand the failure correctly, yes. If the task switches without >> RCU paying attention, the nesting count for both the outgoing and the >> incoming tasks can get messed up. The messed-up counts could easily >> cause problems downstream. >> >> Of course, there might well be additional bugs. >> >> I will put a speculative patch together and send it along. > > And here it is, testing just started. > > Thanx, Paul > > ------------------------------------------------------------------------ > > rcu: Add RCU context switching to schedule_tail() > > The new rcu_read_lock_nesting and rcu_read_unlock_special per-CPU > variables must be saved and restored at every context switch, including > those involving schedule_tail(). This commit therefore adds the saving > and restoring to schedul_tail(). > > Reported-by: Sasha Levin <levinsasha928@gmail.com> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Looks good here. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: New RCU related warning due to rcu_preempt_depth() changes 2012-04-18 5:29 ` Sasha Levin @ 2012-04-18 14:11 ` Paul E. McKenney 0 siblings, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2012-04-18 14:11 UTC (permalink / raw) To: Sasha Levin; +Cc: Dave Jones, linux-kernel@vger.kernel.org List On Wed, Apr 18, 2012 at 07:29:19AM +0200, Sasha Levin wrote: > On Tue, Apr 17, 2012 at 6:45 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Tue, Apr 17, 2012 at 08:53:16AM -0700, Paul E. McKenney wrote: > >> On Tue, Apr 17, 2012 at 05:36:59PM +0200, Sasha Levin wrote: > >> > On Tue, Apr 17, 2012 at 5:05 PM, Paul E. McKenney > >> > <paulmck@linux.vnet.ibm.com> wrote: > >> > > On Tue, Apr 17, 2012 at 10:42:47AM +0200, Sasha Levin wrote: > >> > >> Hi Paul, > >> > >> > >> > >> It looks like commit 7298b03 ("rcu: Move __rcu_read_lock() and > >> > >> __rcu_read_unlock() to per-CPU variables") is causing the following > >> > >> warning (I've added the extra fields on the second line): > >> > >> > >> > >> [ 77.330920] BUG: sleeping function called from invalid context at > >> > >> mm/memory.c:3933 > >> > >> [ 77.336571] in_atomic(): 0, irqs_disabled(): 0, preempt count: 0, > >> > >> preempt offset: 0, rcu depth: 1, pid: 5669, name: trinity > >> > >> [ 77.344135] no locks held by trinity/5669. > >> > >> [ 77.349644] Pid: 5669, comm: trinity Tainted: G W > >> > >> 3.4.0-rc3-next-20120417-sasha-dirty #83 > >> > >> [ 77.354401] Call Trace: > >> > >> [ 77.355956] [<ffffffff810e83f3>] __might_sleep+0x1f3/0x210 > >> > >> [ 77.358811] [<ffffffff81198eaf>] might_fault+0x2f/0xa0 > >> > >> [ 77.361997] [<ffffffff810e3228>] schedule_tail+0x88/0xb0 > >> > >> [ 77.364671] [<ffffffff826a01d3>] ret_from_fork+0x13/0x80 > >> > >> > >> > >> As you can see, rcu_preempt_depth() returns 1 when running in that > >> > >> context, which looks pretty odd. > >> > > > >> > > Ouch!!! > >> > > > >> > > So it looks like I missed a place where I need to save and restore > >> > > the new per-CPU rcu_read_lock_nesting and rcu_read_unlock_special > >> > > variables. My (probably hopelessly naive) guess is that I need to add > >> > > a rcu_switch_from() and rcu_switch_to() into schedule_tail(), but to > >> > > make rcu_switch_from() take the task_struct pointer as an argument, > >> > > passing in prev. > >> > > > >> > > Does this make sense, or am I still missing something here? > >> > > >> > I've let the test run for a bit more, and it appears that I'm getting > >> > this warning from lots of different sources, would this > >> > schedule_tail() fix all of them? > >> > >> If I understand the failure correctly, yes. If the task switches without > >> RCU paying attention, the nesting count for both the outgoing and the > >> incoming tasks can get messed up. The messed-up counts could easily > >> cause problems downstream. > >> > >> Of course, there might well be additional bugs. > >> > >> I will put a speculative patch together and send it along. > > > > And here it is, testing just started. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > rcu: Add RCU context switching to schedule_tail() > > > > The new rcu_read_lock_nesting and rcu_read_unlock_special per-CPU > > variables must be saved and restored at every context switch, including > > those involving schedule_tail(). This commit therefore adds the saving > > and restoring to schedul_tail(). > > > > Reported-by: Sasha Levin <levinsasha928@gmail.com> > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Looks good here. Very good! I have added your Tested-by. Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-04-18 16:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-17 8:42 New RCU related warning due to rcu_preempt_depth() changes Sasha Levin 2012-04-17 15:05 ` Paul E. McKenney 2012-04-17 15:36 ` Sasha Levin 2012-04-17 15:53 ` Paul E. McKenney 2012-04-17 16:45 ` Paul E. McKenney 2012-04-18 5:29 ` Sasha Levin 2012-04-18 14:11 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).