* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" [not found] <tip-80d02085d99039b3b7f3a73c8896226b0cb1ba07@git.kernel.org> @ 2011-05-20 21:04 ` Yinghai Lu 2011-05-20 22:42 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-20 21:04 UTC (permalink / raw) To: linux-kernel, mingo, hpa, paulmck, tglx, mingo; +Cc: linux-tip-commits On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: > Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 > Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > AuthorDate: Thu, 12 May 2011 01:08:07 -0700 > Committer: Ingo Molnar <mingo@elte.hu> > CommitDate: Thu, 19 May 2011 23:25:29 +0200 > > Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" > > This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. > > This reversion was due to (extreme) boot-time slowdowns on SPARC seen by > Yinghai Lu and on x86 by Ingo > . > This is a non-trivial reversion due to intervening commits. > > Conflicts: > > Documentation/RCU/trace.txt > kernel/rcutree.c > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > Documentation/RCU/trace.txt | 17 ++++-- > kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ > kernel/rcutree.h | 9 ++- > kernel/rcutree_plugin.h | 7 +- > kernel/rcutree_trace.c | 12 ++-- > 5 files changed, 102 insertions(+), 73 deletions(-) > current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS still get delay [ 35.419453] cpu_dev_init done [ 128.981770] memory_dev_init done should take only 3 or 4 seconds. Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 21:04 ` [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" Yinghai Lu @ 2011-05-20 22:42 ` Paul E. McKenney 2011-05-20 23:09 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-20 22:42 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo, linux-tip-commits On Fri, May 20, 2011 at 02:04:06PM -0700, Yinghai Lu wrote: > On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: > > Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 > > Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > AuthorDate: Thu, 12 May 2011 01:08:07 -0700 > > Committer: Ingo Molnar <mingo@elte.hu> > > CommitDate: Thu, 19 May 2011 23:25:29 +0200 > > > > Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" > > > > This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. > > > > This reversion was due to (extreme) boot-time slowdowns on SPARC seen by > > Yinghai Lu and on x86 by Ingo > > . > > This is a non-trivial reversion due to intervening commits. > > > > Conflicts: > > > > Documentation/RCU/trace.txt > > kernel/rcutree.c > > > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > --- > > Documentation/RCU/trace.txt | 17 ++++-- > > kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ > > kernel/rcutree.h | 9 ++- > > kernel/rcutree_plugin.h | 7 +- > > kernel/rcutree_trace.c | 12 ++-- > > 5 files changed, 102 insertions(+), 73 deletions(-) > > > > current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS > > still get delay > > [ 35.419453] cpu_dev_init done > [ 128.981770] memory_dev_init done > > should take only 3 or 4 seconds. Thank you for checking this out. This is with the same configuration you sent earlier? (Other than the DEBUG_LOCKING_API_SELFTESTS, of course.) Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 22:42 ` Paul E. McKenney @ 2011-05-20 23:09 ` Yinghai Lu 2011-05-20 23:14 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-20 23:09 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo, linux-tip-commits On 05/20/2011 03:42 PM, Paul E. McKenney wrote: > On Fri, May 20, 2011 at 02:04:06PM -0700, Yinghai Lu wrote: >> On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: >>> Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 >>> Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 >>> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >>> AuthorDate: Thu, 12 May 2011 01:08:07 -0700 >>> Committer: Ingo Molnar <mingo@elte.hu> >>> CommitDate: Thu, 19 May 2011 23:25:29 +0200 >>> >>> Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" >>> >>> This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. >>> >>> This reversion was due to (extreme) boot-time slowdowns on SPARC seen by >>> Yinghai Lu and on x86 by Ingo >>> . >>> This is a non-trivial reversion due to intervening commits. >>> >>> Conflicts: >>> >>> Documentation/RCU/trace.txt >>> kernel/rcutree.c >>> >>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>> --- >>> Documentation/RCU/trace.txt | 17 ++++-- >>> kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ >>> kernel/rcutree.h | 9 ++- >>> kernel/rcutree_plugin.h | 7 +- >>> kernel/rcutree_trace.c | 12 ++-- >>> 5 files changed, 102 insertions(+), 73 deletions(-) >>> >> >> current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS >> >> still get delay >> >> [ 35.419453] cpu_dev_init done >> [ 128.981770] memory_dev_init done >> >> should take only 3 or 4 seconds. > > Thank you for checking this out. > > This is with the same configuration you sent earlier? (Other than the > DEBUG_LOCKING_API_SELFTESTS, of course.) > 8 socket westmere-ex. and compiled with gcc from opensuse 11.3. if compile kernel from fedora 14, there will be not delay. Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 23:09 ` Yinghai Lu @ 2011-05-20 23:14 ` Paul E. McKenney 2011-05-20 23:16 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-20 23:14 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo, linux-tip-commits On Fri, May 20, 2011 at 04:09:22PM -0700, Yinghai Lu wrote: > On 05/20/2011 03:42 PM, Paul E. McKenney wrote: > > On Fri, May 20, 2011 at 02:04:06PM -0700, Yinghai Lu wrote: > >> On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: > >>> Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 > >>> Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 > >>> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >>> AuthorDate: Thu, 12 May 2011 01:08:07 -0700 > >>> Committer: Ingo Molnar <mingo@elte.hu> > >>> CommitDate: Thu, 19 May 2011 23:25:29 +0200 > >>> > >>> Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" > >>> > >>> This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. > >>> > >>> This reversion was due to (extreme) boot-time slowdowns on SPARC seen by > >>> Yinghai Lu and on x86 by Ingo > >>> . > >>> This is a non-trivial reversion due to intervening commits. > >>> > >>> Conflicts: > >>> > >>> Documentation/RCU/trace.txt > >>> kernel/rcutree.c > >>> > >>> Signed-off-by: Ingo Molnar <mingo@elte.hu> > >>> --- > >>> Documentation/RCU/trace.txt | 17 ++++-- > >>> kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ > >>> kernel/rcutree.h | 9 ++- > >>> kernel/rcutree_plugin.h | 7 +- > >>> kernel/rcutree_trace.c | 12 ++-- > >>> 5 files changed, 102 insertions(+), 73 deletions(-) > >>> > >> > >> current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS > >> > >> still get delay > >> > >> [ 35.419453] cpu_dev_init done > >> [ 128.981770] memory_dev_init done > >> > >> should take only 3 or 4 seconds. > > > > Thank you for checking this out. > > > > This is with the same configuration you sent earlier? (Other than the > > DEBUG_LOCKING_API_SELFTESTS, of course.) > > > > 8 socket westmere-ex. and compiled with gcc from opensuse 11.3. > > if compile kernel from fedora 14, there will be not delay. Thank you for the info. Could you please send me the .config files from both builds? Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 23:14 ` Paul E. McKenney @ 2011-05-20 23:16 ` Yinghai Lu 2011-05-20 23:49 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-20 23:16 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo, linux-tip-commits On 05/20/2011 04:14 PM, Paul E. McKenney wrote: > On Fri, May 20, 2011 at 04:09:22PM -0700, Yinghai Lu wrote: >> On 05/20/2011 03:42 PM, Paul E. McKenney wrote: >>> On Fri, May 20, 2011 at 02:04:06PM -0700, Yinghai Lu wrote: >>>> On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: >>>>> Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 >>>>> Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 >>>>> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >>>>> AuthorDate: Thu, 12 May 2011 01:08:07 -0700 >>>>> Committer: Ingo Molnar <mingo@elte.hu> >>>>> CommitDate: Thu, 19 May 2011 23:25:29 +0200 >>>>> >>>>> Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" >>>>> >>>>> This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. >>>>> >>>>> This reversion was due to (extreme) boot-time slowdowns on SPARC seen by >>>>> Yinghai Lu and on x86 by Ingo >>>>> . >>>>> This is a non-trivial reversion due to intervening commits. >>>>> >>>>> Conflicts: >>>>> >>>>> Documentation/RCU/trace.txt >>>>> kernel/rcutree.c >>>>> >>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> >>>>> --- >>>>> Documentation/RCU/trace.txt | 17 ++++-- >>>>> kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ >>>>> kernel/rcutree.h | 9 ++- >>>>> kernel/rcutree_plugin.h | 7 +- >>>>> kernel/rcutree_trace.c | 12 ++-- >>>>> 5 files changed, 102 insertions(+), 73 deletions(-) >>>>> >>>> >>>> current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS >>>> >>>> still get delay >>>> >>>> [ 35.419453] cpu_dev_init done >>>> [ 128.981770] memory_dev_init done >>>> >>>> should take only 3 or 4 seconds. >>> >>> Thank you for checking this out. >>> >>> This is with the same configuration you sent earlier? (Other than the >>> DEBUG_LOCKING_API_SELFTESTS, of course.) >>> >> >> 8 socket westmere-ex. and compiled with gcc from opensuse 11.3. >> >> if compile kernel from fedora 14, there will be not delay. > > Thank you for the info. Could you please send me the .config files from > both builds? the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. Thanks ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 23:16 ` Yinghai Lu @ 2011-05-20 23:49 ` Paul E. McKenney 2011-05-21 0:02 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-20 23:49 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo, linux-tip-commits On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > On 05/20/2011 04:14 PM, Paul E. McKenney wrote: > > On Fri, May 20, 2011 at 04:09:22PM -0700, Yinghai Lu wrote: > >> On 05/20/2011 03:42 PM, Paul E. McKenney wrote: > >>> On Fri, May 20, 2011 at 02:04:06PM -0700, Yinghai Lu wrote: > >>>> On 05/19/2011 02:37 PM, tip-bot for Paul E. McKenney wrote: > >>>>> Commit-ID: 80d02085d99039b3b7f3a73c8896226b0cb1ba07 > >>>>> Gitweb: http://git.kernel.org/tip/80d02085d99039b3b7f3a73c8896226b0cb1ba07 > >>>>> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >>>>> AuthorDate: Thu, 12 May 2011 01:08:07 -0700 > >>>>> Committer: Ingo Molnar <mingo@elte.hu> > >>>>> CommitDate: Thu, 19 May 2011 23:25:29 +0200 > >>>>> > >>>>> Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" > >>>>> > >>>>> This reverts commit e59fb3120becfb36b22ddb8bd27d065d3cdca499. > >>>>> > >>>>> This reversion was due to (extreme) boot-time slowdowns on SPARC seen by > >>>>> Yinghai Lu and on x86 by Ingo > >>>>> . > >>>>> This is a non-trivial reversion due to intervening commits. > >>>>> > >>>>> Conflicts: > >>>>> > >>>>> Documentation/RCU/trace.txt > >>>>> kernel/rcutree.c > >>>>> > >>>>> Signed-off-by: Ingo Molnar <mingo@elte.hu> > >>>>> --- > >>>>> Documentation/RCU/trace.txt | 17 ++++-- > >>>>> kernel/rcutree.c | 130 ++++++++++++++++++++++++------------------ > >>>>> kernel/rcutree.h | 9 ++- > >>>>> kernel/rcutree_plugin.h | 7 +- > >>>>> kernel/rcutree_trace.c | 12 ++-- > >>>>> 5 files changed, 102 insertions(+), 73 deletions(-) > >>>>> > >>>> > >>>> current tip/master that have this reverting and without setting DEBUG_LOCKING_API_SELFTESTS > >>>> > >>>> still get delay > >>>> > >>>> [ 35.419453] cpu_dev_init done > >>>> [ 128.981770] memory_dev_init done > >>>> > >>>> should take only 3 or 4 seconds. > >>> > >>> Thank you for checking this out. > >>> > >>> This is with the same configuration you sent earlier? (Other than the > >>> DEBUG_LOCKING_API_SELFTESTS, of course.) > >>> > >> > >> 8 socket westmere-ex. and compiled with gcc from opensuse 11.3. > >> > >> if compile kernel from fedora 14, there will be not delay. > > > > Thank you for the info. Could you please send me the .config files from > > both builds? > > the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. OK, just to make sure I understand... You are compiling exactly the same kernel source tree with exactly the same .config, just with two different versions of gcc, correct? If so, it is quite possible that the slow one is the correct one. :-/ Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-20 23:49 ` Paul E. McKenney @ 2011-05-21 0:02 ` Yinghai Lu 2011-05-21 13:18 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-21 0:02 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: ... >> >> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > > OK, just to make sure I understand... You are compiling exactly the > same kernel source tree with exactly the same .config, just with two > different versions of gcc, correct? yes. > > If so, it is quite possible that the slow one is the correct one. :-/ yeah, new version always have problem. looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-21 0:02 ` Yinghai Lu @ 2011-05-21 13:18 ` Paul E. McKenney 2011-05-21 14:08 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-21 13:18 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: > On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > > On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > ... > >> > >> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > > > > OK, just to make sure I understand... You are compiling exactly the > > same kernel source tree with exactly the same .config, just with two > > different versions of gcc, correct? > yes. > > > > If so, it is quite possible that the slow one is the correct one. :-/ > yeah, new version always have problem. > > looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow one (4.5.0), correct? Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-21 13:18 ` Paul E. McKenney @ 2011-05-21 14:08 ` Paul E. McKenney 2011-05-23 20:14 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-21 14:08 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: > On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: > > On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > > > On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > > ... > > >> > > >> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > > > > > > OK, just to make sure I understand... You are compiling exactly the > > > same kernel source tree with exactly the same .config, just with two > > > different versions of gcc, correct? > > yes. > > > > > > If so, it is quite possible that the slow one is the correct one. :-/ > > yeah, new version always have problem. > > > > looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 > > OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow > one (4.5.0), correct? And does commit c7a3786030 help? This commit (from Peter Zijlstra) tidied up RCU kthreads' scheduler interactions. The patch is below, though it is probably more convenient to pull it from the rcu/next branch of: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git Thanx, Paul ------------------------------------------------------------------------ rcu: Remove waitqueue usage for cpu, node, and boost kthreads It is not necessary to use waitqueues for the RCU kthreads because we always know exactly which thread is to be awakened. In addition, wake_up() only issues an actual wakeup when there is a thread waiting on the queue, which was why there was an extra explicit wake_up_process() to get the RCU kthreads started. Eliminating the waitqueues (and wake_up()) in favor of wake_up_process() eliminates the need for the initial wake_up_process() and also shrinks the data structure size a bit. The wakeup logic is placed in a new rcu_wait() macro. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 5b0b482..a1a8bb6 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -94,7 +94,6 @@ static DEFINE_PER_CPU(struct task_struct *, rcu_cpu_kthread_task); DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status); DEFINE_PER_CPU(int, rcu_cpu_kthread_cpu); DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops); -static DEFINE_PER_CPU(wait_queue_head_t, rcu_cpu_wq); DEFINE_PER_CPU(char, rcu_cpu_has_work); static char rcu_kthreads_spawnable; @@ -1475,7 +1474,7 @@ static void invoke_rcu_cpu_kthread(void) local_irq_restore(flags); return; } - wake_up(&__get_cpu_var(rcu_cpu_wq)); + wake_up_process(__this_cpu_read(rcu_cpu_kthread_task)); local_irq_restore(flags); } @@ -1595,14 +1594,12 @@ static int rcu_cpu_kthread(void *arg) unsigned long flags; int spincnt = 0; unsigned int *statusp = &per_cpu(rcu_cpu_kthread_status, cpu); - wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu); char work; char *workp = &per_cpu(rcu_cpu_has_work, cpu); for (;;) { *statusp = RCU_KTHREAD_WAITING; - wait_event_interruptible(*wqp, - *workp != 0 || kthread_should_stop()); + rcu_wait(*workp != 0 || kthread_should_stop()); local_bh_disable(); if (rcu_cpu_kthread_should_stop(cpu)) { local_bh_enable(); @@ -1653,7 +1650,6 @@ static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu) per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu; WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL); per_cpu(rcu_cpu_kthread_task, cpu) = t; - wake_up_process(t); sp.sched_priority = RCU_KTHREAD_PRIO; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); return 0; @@ -1676,8 +1672,7 @@ static int rcu_node_kthread(void *arg) for (;;) { rnp->node_kthread_status = RCU_KTHREAD_WAITING; - wait_event_interruptible(rnp->node_wq, - atomic_read(&rnp->wakemask) != 0); + rcu_wait(atomic_read(&rnp->wakemask) != 0); rnp->node_kthread_status = RCU_KTHREAD_RUNNING; raw_spin_lock_irqsave(&rnp->lock, flags); mask = atomic_xchg(&rnp->wakemask, 0); @@ -1761,7 +1756,6 @@ static int __cpuinit rcu_spawn_one_node_kthread(struct rcu_state *rsp, raw_spin_lock_irqsave(&rnp->lock, flags); rnp->node_kthread_task = t; raw_spin_unlock_irqrestore(&rnp->lock, flags); - wake_up_process(t); sp.sched_priority = 99; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); } @@ -1778,21 +1772,16 @@ static int __init rcu_spawn_kthreads(void) rcu_kthreads_spawnable = 1; for_each_possible_cpu(cpu) { - init_waitqueue_head(&per_cpu(rcu_cpu_wq, cpu)); per_cpu(rcu_cpu_has_work, cpu) = 0; if (cpu_online(cpu)) (void)rcu_spawn_one_cpu_kthread(cpu); } rnp = rcu_get_root(rcu_state); - init_waitqueue_head(&rnp->node_wq); - rcu_init_boost_waitqueue(rnp); (void)rcu_spawn_one_node_kthread(rcu_state, rnp); - if (NUM_RCU_NODES > 1) - rcu_for_each_leaf_node(rcu_state, rnp) { - init_waitqueue_head(&rnp->node_wq); - rcu_init_boost_waitqueue(rnp); + if (NUM_RCU_NODES > 1) { + rcu_for_each_leaf_node(rcu_state, rnp) (void)rcu_spawn_one_node_kthread(rcu_state, rnp); - } + } return 0; } early_initcall(rcu_spawn_kthreads); diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 561dcb9..7b9a08b 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -159,9 +159,6 @@ struct rcu_node { struct task_struct *boost_kthread_task; /* kthread that takes care of priority */ /* boosting for this rcu_node structure. */ - wait_queue_head_t boost_wq; - /* Wait queue on which to park the boost */ - /* kthread. */ unsigned int boost_kthread_status; /* State of boost_kthread_task for tracing. */ unsigned long n_tasks_boosted; @@ -188,9 +185,6 @@ struct rcu_node { /* kthread that takes care of this rcu_node */ /* structure, for example, awakening the */ /* per-CPU kthreads as needed. */ - wait_queue_head_t node_wq; - /* Wait queue on which to park the per-node */ - /* kthread. */ unsigned int node_kthread_status; /* State of node_kthread_task for tracing. */ } ____cacheline_internodealigned_in_smp; @@ -336,6 +330,16 @@ struct rcu_data { /* scheduling clock irq */ /* before ratting on them. */ +#define rcu_wait(cond) \ +do { \ + for (;;) { \ + set_current_state(TASK_INTERRUPTIBLE); \ + if (cond) \ + break; \ + schedule(); \ + } \ + __set_current_state(TASK_RUNNING); \ +} while (0) /* * RCU global state, including node hierarchy. This hierarchy is @@ -445,7 +449,6 @@ static void __cpuinit rcu_preempt_init_percpu_data(int cpu); static void rcu_preempt_send_cbs_to_online(void); static void __init __rcu_init_preempt(void); static void rcu_needs_cpu_flush(void); -static void __init rcu_init_boost_waitqueue(struct rcu_node *rnp); static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, cpumask_var_t cm); diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ed339702..049f278 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -1196,8 +1196,7 @@ static int rcu_boost_kthread(void *arg) for (;;) { rnp->boost_kthread_status = RCU_KTHREAD_WAITING; - wait_event_interruptible(rnp->boost_wq, rnp->boost_tasks || - rnp->exp_tasks); + rcu_wait(rnp->boost_tasks || rnp->exp_tasks); rnp->boost_kthread_status = RCU_KTHREAD_RUNNING; more2boost = rcu_boost(rnp); if (more2boost) @@ -1275,14 +1274,6 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) } /* - * Initialize the RCU-boost waitqueue. - */ -static void __init rcu_init_boost_waitqueue(struct rcu_node *rnp) -{ - init_waitqueue_head(&rnp->boost_wq); -} - -/* * Create an RCU-boost kthread for the specified node if one does not * already exist. We only create this kthread for preemptible RCU. * Returns zero if all is well, a negated errno otherwise. @@ -1306,7 +1297,6 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp, raw_spin_lock_irqsave(&rnp->lock, flags); rnp->boost_kthread_task = t; raw_spin_unlock_irqrestore(&rnp->lock, flags); - wake_up_process(t); sp.sched_priority = RCU_KTHREAD_PRIO; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); return 0; @@ -1328,10 +1318,6 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) { } -static void __init rcu_init_boost_waitqueue(struct rcu_node *rnp) -{ -} - static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp, struct rcu_node *rnp, int rnp_index) ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-21 14:08 ` Paul E. McKenney @ 2011-05-23 20:14 ` Yinghai Lu 2011-05-23 21:25 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-23 20:14 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/21/2011 07:08 AM, Paul E. McKenney wrote: > On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: >> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: >>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: >>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: >>> ... >>>>> >>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. >>>> >>>> OK, just to make sure I understand... You are compiling exactly the >>>> same kernel source tree with exactly the same .config, just with two >>>> different versions of gcc, correct? >>> yes. >>>> >>>> If so, it is quite possible that the slow one is the correct one. :-/ >>> yeah, new version always have problem. >>> >>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 >> >> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow >> one (4.5.0), correct? > > And does commit c7a3786030 help? This commit (from Peter Zijlstra) > tidied up RCU kthreads' scheduler interactions. The patch is below, > though it is probably more convenient to pull it from the rcu/next > branch of: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > [ 337.132517] INFO: task rcun0:8 blocked for more than 120 seconds. [ 337.133238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.160396] rcun0 D 0000000000000000 0 8 2 0x00000000 [ 337.161232] ffff882070d3fe90 0000000000000046 ffff882070d3e000 0000000000004000 [ 337.161291] 00000000001d1f80 ffff882070d3ffd8 00000000001d1f80 ffff882070d3ffd8 [ 337.161348] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff882070d422b0 [ 337.161404] Call Trace: [ 337.161433] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.161459] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.161486] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.161512] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.161533] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.161558] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.161584] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.161606] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.161627] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.161645] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.161651] no locks held by rcun0/8. [ 337.161723] INFO: task rcun3:171 blocked for more than 120 seconds. [ 337.161729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.161736] rcun3 D 0000000000000000 0 171 2 0x00000000 [ 337.161748] ffff8820705bfe90 0000000000000046 ffff8820705be000 0000000000004000 [ 337.161786] 00000000001d1f80 ffff8820705bffd8 00000000001d1f80 ffff8820705bffd8 [ 337.161825] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff8820705ca2b0 [ 337.161863] Call Trace: [ 337.161879] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.161895] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.161913] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.161930] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.161947] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.161963] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.161979] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.161995] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.162011] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.162026] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.162032] no locks held by rcun3/171. [ 337.162110] INFO: task rcun5:333 blocked for more than 120 seconds. [ 337.162117] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.162123] rcun5 D 0000000000000000 0 333 2 0x00000000 [ 337.162134] ffff882070079e90 0000000000000046 ffff882070078000 0000000000004000 [ 337.162173] 00000000001d1f80 ffff882070079fd8 00000000001d1f80 ffff882070079fd8 [ 337.162213] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff882070070000 [ 337.162250] Call Trace: [ 337.162265] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.162282] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.162299] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.162316] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.162333] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.162349] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.162366] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.162382] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.162399] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.162415] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.162421] no locks held by rcun5/333. [ 337.162469] INFO: task rcun6:414 blocked for more than 120 seconds. [ 337.162475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.162481] rcun6 D 0000000000000000 0 414 2 0x00000000 [ 337.162493] ffff8820701b3e90 0000000000000046 ffff8820701b2000 0000000000004000 [ 337.162532] 00000000001d1f80 ffff8820701b3fd8 00000000001d1f80 ffff8820701b3fd8 [ 337.162571] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff8820701ac560 [ 337.162610] Call Trace: [ 337.162625] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.162640] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.162659] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.162676] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.162692] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.162709] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.162726] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.162741] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.162757] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.162773] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.162779] no locks held by rcun6/414. [ 337.162816] INFO: task rcun7:495 blocked for more than 120 seconds. [ 337.162822] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.162828] rcun7 D 0000000000000000 0 495 2 0x00000000 [ 337.162839] ffff8820703d1e90 0000000000000046 ffff8820703d0000 0000000000004000 [ 337.162880] 00000000001d1f80 ffff8820703d1fd8 00000000001d1f80 ffff8820703d1fd8 [ 337.162919] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff8820703ca2b0 [ 337.162957] Call Trace: [ 337.162973] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.162989] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.163006] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163022] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.163039] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163055] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.163072] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.163087] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.163104] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.163120] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.163125] no locks held by rcun7/495. [ 337.163166] INFO: task rcun8:576 blocked for more than 120 seconds. [ 337.163172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.163178] rcun8 D 0000000000000000 0 576 2 0x00000000 [ 337.163190] ffff881fff913e90 0000000000000046 ffff881fff912000 0000000000004000 [ 337.163231] 00000000001d1f80 ffff881fff913fd8 00000000001d1f80 ffff881fff913fd8 [ 337.163271] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff881fff90a2b0 [ 337.163309] Call Trace: [ 337.163324] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.163340] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.163358] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163375] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.163392] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163409] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.163425] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.163440] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.163457] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.163473] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.163479] no locks held by rcun8/576. [ 337.163558] INFO: task rcun10:738 blocked for more than 120 seconds. [ 337.163564] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 337.163570] rcun10 D 0000000000000000 0 738 2 0x00000000 [ 337.163582] ffff881fffbb5e90 0000000000000046 ffff881fffbb4000 0000000000004000 [ 337.163620] 00000000001d1f80 ffff881fffbb5fd8 00000000001d1f80 ffff881fffbb5fd8 [ 337.163659] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff881fffbba2b0 [ 337.163697] Call Trace: [ 337.163712] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f [ 337.163728] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 337.163744] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163760] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf [ 337.163777] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 337.163793] [<ffffffff81099e41>] kthread+0x8c/0xa8 [ 337.163810] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 [ 337.163826] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe [ 337.163842] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b [ 337.163857] [<ffffffff81c257d0>] ? gs_change+0xb/0xb [ 337.163863] no locks held by rcun10/738. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 20:14 ` Yinghai Lu @ 2011-05-23 21:25 ` Paul E. McKenney 2011-05-23 22:01 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-23 21:25 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: > On 05/21/2011 07:08 AM, Paul E. McKenney wrote: > > On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: > >> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: > >>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > >>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > >>> ... > >>>>> > >>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > >>>> > >>>> OK, just to make sure I understand... You are compiling exactly the > >>>> same kernel source tree with exactly the same .config, just with two > >>>> different versions of gcc, correct? > >>> yes. > >>>> > >>>> If so, it is quite possible that the slow one is the correct one. :-/ > >>> yeah, new version always have problem. > >>> > >>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 > >> > >> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow > >> one (4.5.0), correct? > > > > And does commit c7a3786030 help? This commit (from Peter Zijlstra) > > tidied up RCU kthreads' scheduler interactions. The patch is below, > > though it is probably more convenient to pull it from the rcu/next > > branch of: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > Thank you for testing this! This is with the same config that you emailed out on May 12th? In particular, CONFIG_TREE_RCU=y? > [ 337.132517] INFO: task rcun0:8 blocked for more than 120 seconds. > [ 337.133238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 337.160396] rcun0 D 0000000000000000 0 8 2 0x00000000 > [ 337.161232] ffff882070d3fe90 0000000000000046 ffff882070d3e000 0000000000004000 > [ 337.161291] 00000000001d1f80 ffff882070d3ffd8 00000000001d1f80 ffff882070d3ffd8 > [ 337.161348] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff882070d422b0 > [ 337.161404] Call Trace: > [ 337.161433] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f > [ 337.161459] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > [ 337.161486] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 337.161512] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf > [ 337.161533] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 337.161558] [<ffffffff81099e41>] kthread+0x8c/0xa8 > [ 337.161584] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 > [ 337.161606] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe > [ 337.161627] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b > [ 337.161645] [<ffffffff81c257d0>] ? gs_change+0xb/0xb > [ 337.161651] no locks held by rcun0/8. This is quite surprising. The "rcun" kthreads invoke rcu_node_kthread(), which does not call rcu_cpu_kthread_should_stop(). But perhaps the stack backtrace got confused. Could you please try the following diagnostic patch to help me work out where the rcun threads are getting stuck? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcutree.c b/kernel/rcutree.c index b2868ea..50883dd 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1675,11 +1675,15 @@ static int rcu_node_kthread(void *arg) for (;;) { rnp->node_kthread_status = RCU_KTHREAD_WAITING; + printk(KERN_INFO "rcun %p starting wait for work.\n", rnp); rcu_wait(atomic_read(&rnp->wakemask) != 0); + printk(KERN_INFO "rcun %p completed wait for work.\n", rnp); rnp->node_kthread_status = RCU_KTHREAD_RUNNING; raw_spin_lock_irqsave(&rnp->lock, flags); mask = atomic_xchg(&rnp->wakemask, 0); + printk(KERN_INFO "rcun %p initiating boost.\n", rnp); rcu_initiate_boost(rnp, flags); /* releases rnp->lock. */ + printk(KERN_INFO "rcun %p completed boost.\n", rnp); for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) { if ((mask & 0x1) == 0) continue; @@ -1689,10 +1693,12 @@ static int rcu_node_kthread(void *arg) preempt_enable(); continue; } + printk(KERN_INFO "rcun %p awaking rcuc%d.\n", rnp, cpu); per_cpu(rcu_cpu_has_work, cpu) = 1; sp.sched_priority = RCU_KTHREAD_PRIO; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); preempt_enable(); + printk(KERN_INFO "rcun %p awakened rcuc%d.\n", rnp, cpu); } } /* NOTREACHED */ ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 21:25 ` Paul E. McKenney @ 2011-05-23 22:01 ` Yinghai Lu 2011-05-23 22:55 ` Yinghai Lu 2011-05-24 1:12 ` Paul E. McKenney 0 siblings, 2 replies; 45+ messages in thread From: Yinghai Lu @ 2011-05-23 22:01 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/23/2011 02:25 PM, Paul E. McKenney wrote: > On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: >> On 05/21/2011 07:08 AM, Paul E. McKenney wrote: >>> On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: >>>> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: >>>>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: >>>>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: >>>>> ... >>>>>>> >>>>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. >>>>>> >>>>>> OK, just to make sure I understand... You are compiling exactly the >>>>>> same kernel source tree with exactly the same .config, just with two >>>>>> different versions of gcc, correct? >>>>> yes. >>>>>> >>>>>> If so, it is quite possible that the slow one is the correct one. :-/ >>>>> yeah, new version always have problem. >>>>> >>>>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 >>>> >>>> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow >>>> one (4.5.0), correct? >>> >>> And does commit c7a3786030 help? This commit (from Peter Zijlstra) >>> tidied up RCU kthreads' scheduler interactions. The patch is below, >>> though it is probably more convenient to pull it from the rcu/next >>> branch of: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git >>> > > Thank you for testing this! it's ok. don't want to see our servers have problem with newer kernel. > > This is with the same config that you emailed out on May 12th? yes. > > In particular, CONFIG_TREE_RCU=y? > >> [ 337.132517] INFO: task rcun0:8 blocked for more than 120 seconds. >> [ 337.133238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 337.160396] rcun0 D 0000000000000000 0 8 2 0x00000000 >> [ 337.161232] ffff882070d3fe90 0000000000000046 ffff882070d3e000 0000000000004000 >> [ 337.161291] 00000000001d1f80 ffff882070d3ffd8 00000000001d1f80 ffff882070d3ffd8 >> [ 337.161348] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff882070d422b0 >> [ 337.161404] Call Trace: >> [ 337.161433] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f >> [ 337.161459] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 >> [ 337.161486] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 >> [ 337.161512] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf >> [ 337.161533] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 >> [ 337.161558] [<ffffffff81099e41>] kthread+0x8c/0xa8 >> [ 337.161584] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 >> [ 337.161606] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe >> [ 337.161627] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b >> [ 337.161645] [<ffffffff81c257d0>] ? gs_change+0xb/0xb >> [ 337.161651] no locks held by rcun0/8. > > This is quite surprising. The "rcun" kthreads invoke rcu_node_kthread(), > which does not call rcu_cpu_kthread_should_stop(). > > But perhaps the stack backtrace got confused. > > Could you please try the following diagnostic patch to help me work out > where the rcun threads are getting stuck? > [ 275.679636] RAMDISK: gzip image found at block 0 [ 277.504381] rcun ffffffff82440b00 starting wait for work. [ 277.504693] rcun ffffffff82440b00 completed wait for work. [ 277.504951] rcun ffffffff82440b00 initiating boost. [ 277.515920] rcun ffffffff82440b00 completed boost. [ 277.516157] rcun ffffffff82440b00 awaking rcuc2. [ 277.535818] rcun ffffffff82440b00 awakened rcuc2. [ 277.536075] rcun ffffffff82440b00 starting wait for work. [ 277.604609] EXT3-fs: barriers not enabled [ 277.605278] kjournald starting. Commit interval 5 seconds [ 277.605473] EXT3-fs (ram0): warning: maximal mount count reached, running e2fsck is recommended [ 277.605493] EXT3-fs (ram0): using internal journal [ 277.605505] EXT3-fs (ram0): mounted filesystem with writeback data mode [ 277.605555] VFS: Mounted root (ext3 filesystem) on device 1:0. [ 277.605600] async_waiting @ 1 [ 277.605604] async_continuing @ 1 after 0 usec [ 277.669722] Freeing unused kernel memory: 2892k freed INIT: version 2.86 booting System Boot Control: Running /etc/init.d/boot Mounting procfs at /proc done Mounting sysfs at /sys done Mounting debugfs at /sys/kernel/debug done Mounting tmpfs at /dev done Initializing /dev done Mounting devpts at /dev/pts done Boot logging started on /dev/ttyS0(/dev/console) at Wed May 25 13:43:55 2011 FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory Setting up the hardware clockmodprobe: FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory hwclock: With --noadjfile, you must specify either --utc or --localtime ^[[8[ 277.836052] udevd (18674): /proc/18674/oom_adj is deprecated, please use /proc/18674/oom_score_adj instead. fa[ 277.845689] udevd version 128 started iled Disabling IP forwarding done done Starting udevd: done Loading drivers, configuring devices: [ 278.019705] rcun ffffffff82441300 starting wait for work. [ 278.020057] rcun ffffffff82441300 completed wait for work. [ 278.020314] rcun ffffffff82441300 initiating boost. [ 278.035052] rcun ffffffff82441300 completed boost. [ 278.035383] rcun ffffffff82441300 awaking rcuc141. [ 278.054945] rcun ffffffff82441300 awakened rcuc141. [ 278.055197] rcun ffffffff82441300 starting wait for work. [ 278.159361] rcun ffffffff82441100 starting wait for work. [ 278.159666] rcun ffffffff82441100 completed wait for work. [ 278.159949] rcun ffffffff82441100 initiating boost. [ 278.174805] rcun ffffffff82441100 completed boost. [ 278.175044] rcun ffffffff82441100 awaking rcuc111. [ 278.194728] rcun ffffffff82441100 awakened rcuc111. [ 278.194987] rcun ffffffff82441100 starting wait for work. [ 278.303039] rcun ffffffff82440c00 starting wait for work. [ 278.303323] rcun ffffffff82440c00 completed wait for work. [ 278.303568] rcun ffffffff82440c00 initiating boost. [ 278.314519] rcun ffffffff82440c00 completed boost. [ 278.314750] rcun ffffffff82440c00 awaking rcuc31. [ 278.334455] rcun ffffffff82440c00 awakened rcuc31. [ 278.334687] rcun ffffffff82440c00 starting wait for work. [ 278.498822] rcun ffffffff82441400 starting wait for work. [ 278.499131] rcun ffffffff82441400 completed wait for work. [ 278.499412] rcun ffffffff82441400 initiating boost. [ 278.514295] rcun ffffffff82441400 completed boost. [ 278.514632] rcun ffffffff82441400 awaking rcuc151. [ 278.534082] rcun ffffffff82441400 awakened rcuc151. [ 278.534338] rcun ffffffff82441400 starting wait for work. [ 278.686359] rcun ffffffff82440e00 starting wait for work. [ 278.686670] rcun ffffffff82440e00 completed wait for work. [ 278.686927] rcun ffffffff82440e00 initiating boost. [ 278.703910] rcun ffffffff82440e00 completed boost. [ 278.704148] rcun ffffffff82440e00 awaking rcuc51. [ 278.723778] rcun ffffffff82440e00 awakened rcuc51. [ 278.724036] rcun ffffffff82440e00 starting wait for work. [ 278.762564] rcun ffffffff82440e00 completed wait for work. [ 278.762863] rcun ffffffff82440e00 initiating boost. [ 278.763540] rcun ffffffff82440e00 completed boost. [ 278.763782] rcun ffffffff82440e00 awaking rcuc51. [ 278.764012] rcun ffffffff82440e00 awakened rcuc51. [ 278.783768] rcun ffffffff82440e00 starting wait for work. [ 278.784047] rcun ffffffff82440e00 completed wait for work. [ 278.803684] rcun ffffffff82440e00 initiating boost. [ 278.803937] rcun ffffffff82440e00 completed boost. [ 278.823598] rcun ffffffff82440e00 awaking rcuc51. [ 278.823851] rcun ffffffff82440e00 awakened rcuc51. [ 278.843603] rcun ffffffff82440e00 starting wait for work. [ 278.922171] rcun ffffffff82441300 completed wait for work. [ 278.922498] rcun ffffffff82441300 initiating boost. [ 278.922762] rcun ffffffff82441300 completed boost. [ 278.934066] rcun ffffffff82441300 awaking rcuc131. [ 278.934376] rcun ffffffff82441300 awakened rcuc131. [ 278.953529] rcun ffffffff82441300 starting wait for work. [ 279.017973] rcun ffffffff82440e00 completed wait for work. [ 279.018288] rcun ffffffff82440e00 initiating boost. [ 279.018515] rcun ffffffff82440e00 completed boost. [ 279.033336] rcun ffffffff82440e00 awaking rcuc51. [ 279.034041] rcun ffffffff82440e00 awakened rcuc51. [ 279.053188] rcun ffffffff82440e00 starting wait for work. [ 279.149846] rcun ffffffff82441300 completed wait for work. [ 279.150185] rcun ffffffff82441300 initiating boost. [ 279.150438] rcun ffffffff82441300 completed boost. [ 279.163080] rcun ffffffff82441300 awaking rcuc131. [ 279.163341] rcun ffffffff82441300 awakened rcuc131. [ 279.183285] rcun ffffffff82441300 starting wait for work. [ 279.313608] rcun ffffffff82441300 completed wait for work. [ 279.313983] rcun ffffffff82441300 initiating boost. [ 279.314216] rcun ffffffff82441300 completed boost. [ 279.332841] rcun ffffffff82441300 awaking rcuc131. [ 279.333359] rcun ffffffff82441300 awakened rcuc131. [ 279.352563] rcun ffffffff82441300 starting wait for work. [ 279.409412] rcun ffffffff82441300 completed wait for work. [ 279.409775] rcun ffffffff82441300 initiating boost. [ 279.410057] rcun ffffffff82441300 completed boost. [ 279.422561] rcun ffffffff82441300 awaking rcuc131. [ 279.422810] rcun ffffffff82441300 awakened rcuc131. [ 279.442473] rcun ffffffff82441300 starting wait for work. [ 279.932452] rcun ffffffff82441100 completed wait for work. [ 279.932806] rcun ffffffff82441100 initiating boost. [ 279.933047] rcun ffffffff82441100 completed boost. [ 279.952298] rcun ffffffff82441100 awaking rcuc110. [ 279.952658] rcun ffffffff82441100 awakened rcuc110. [ 279.971749] rcun ffffffff82441100 starting wait for work. [ 279.972249] rcun ffffffff82441100 completed wait for work. [ 279.991659] rcun ffffffff82441100 initiating boost. [ 279.992066] rcun ffffffff82441100 completed boost. [ 280.011403] rcun ffffffff82441100 awaking rcuc110. [ 280.011658] rcun ffffffff82441100 awakened rcuc110. [ 280.012070] rcun ffffffff82441100 starting wait for work. [ 280.112094] rcun ffffffff82440b00 completed wait for work. [ 280.112427] rcun ffffffff82440b00 initiating boost. [ 280.112674] rcun ffffffff82440b00 completed boost. [ 280.131375] rcun ffffffff82440b00 awaking rcuc11. [ 280.131651] rcun ffffffff82440b00 awakened rcuc11. [ 280.151524] rcun ffffffff82440b00 starting wait for work. [ 280.459704] rcun ffffffff82440b00 completed wait for work. [ 280.459997] rcun ffffffff82440b00 initiating boost. [ 280.460228] rcun ffffffff82440b00 completed boost. [ 280.470779] rcun ffffffff82440b00 awaking rcuc0. [ 280.471062] rcun ffffffff82440b00 awakened rcuc0. [ 280.490721] rcun ffffffff82440b00 starting wait for work. [ 280.567316] rcun ffffffff82441400 completed wait for work. [ 280.567647] rcun ffffffff82441400 initiating boost. [ 280.567897] rcun ffffffff82441400 completed boost. [ 280.580815] rcun ffffffff82441400 awaking rcuc151. [ 280.581116] rcun ffffffff82441400 awakened rcuc151. [ 280.600382] rcun ffffffff82441400 starting wait for work. [ 280.695170] rcun ffffffff82440f00 starting wait for work. [ 280.695506] rcun ffffffff82440f00 completed wait for work. [ 280.695847] rcun ffffffff82440f00 initiating boost. [ 280.710661] rcun ffffffff82440f00 completed boost. [ 280.711207] rcun ffffffff82440f00 awaking rcuc71. [ 280.730198] rcun ffffffff82440f00 awakened rcuc71. [ 280.730443] rcun ffffffff82440f00 starting wait for work. [ 281.601394] rcun ffffffff82440c00 completed wait for work. [ 281.601753] rcun ffffffff82440c00 initiating boost. [ 281.602004] rcun ffffffff82440c00 completed boost. [ 281.618891] rcun ffffffff82440c00 awaking rcuc30. [ 281.619164] rcun ffffffff82440c00 awakened rcuc30. [ 281.638755] rcun ffffffff82440c00 starting wait for work. [ 281.729334] rcun ffffffff82441300 completed wait for work. [ 281.729661] rcun ffffffff82441300 initiating boost. [ 281.729920] rcun ffffffff82441300 completed boost. [ 281.748587] rcun ffffffff82441300 awaking rcuc131. [ 281.748871] rcun ffffffff82441300 awakened rcuc131. [ 281.768287] rcun ffffffff82441300 starting wait for work. [ 281.905078] rcun ffffffff82440b00 completed wait for work. [ 281.905380] rcun ffffffff82440b00 initiating boost. [ 281.905623] rcun ffffffff82440b00 completed boost. [ 281.918170] rcun ffffffff82440b00 awaking rcuc11. [ 281.918450] rcun ffffffff82440b00 awakened rcuc11. [ 281.938055] rcun ffffffff82440b00 starting wait for work. [ 282.240380] rcun ffffffff82441300 completed wait for work. [ 282.240667] rcun ffffffff82441300 initiating boost. [ 282.240890] rcun ffffffff82441300 completed boost. [ 282.257498] rcun ffffffff82441300 awaking rcuc130. [ 282.257772] rcun ffffffff82441300 awakened rcuc130. [ 282.277380] rcun ffffffff82441300 starting wait for work. [ 282.304255] rcun ffffffff82441300 completed wait for work. [ 282.304551] rcun ffffffff82441300 initiating boost. [ 282.304792] rcun ffffffff82441300 completed boost. [ 282.317376] rcun ffffffff82441300 awaking rcuc130. [ 282.317639] rcun ffffffff82441300 awakened rcuc130. [ 282.337291] rcun ffffffff82441300 starting wait for work. [ 282.427834] rcun ffffffff82440e00 completed wait for work. [ 282.428165] rcun ffffffff82440e00 initiating boost. [ 282.428404] rcun ffffffff82440e00 completed boost. [ 282.447168] rcun ffffffff82440e00 awaking rcuc50. [ 282.447398] rcun ffffffff82440e00 awakened rcuc50. [ 282.467022] rcun ffffffff82440e00 starting wait for work. [ 282.543751] rcun ffffffff82441300 completed wait for work. [ 282.544030] rcun ffffffff82441300 initiating boost. [ 282.544262] rcun ffffffff82441300 completed boost. [ 282.556969] rcun ffffffff82441300 awaking rcuc130. [ 282.557221] rcun ffffffff82441300 awakened rcuc130. [ 282.576959] rcun ffffffff82441300 starting wait for work. [ 282.651510] rcun ffffffff82440e00 completed wait for work. [ 282.651859] rcun ffffffff82440e00 initiating boost. [ 282.652115] rcun ffffffff82440e00 completed boost. [ 282.666799] rcun ffffffff82440e00 awaking rcuc50. [ 282.667062] rcun ffffffff82440e00 awakened rcuc50. [ 282.686638] rcun ffffffff82440e00 starting wait for work. [ 283.469957] rcun ffffffff82440c00 completed wait for work. [ 283.470235] rcun ffffffff82440c00 initiating boost. [ 283.470457] rcun ffffffff82440c00 completed boost. [ 283.485312] rcun ffffffff82440c00 awaking rcuc20. [ 283.485548] rcun ffffffff82440c00 awakened rcuc20. [ 283.505188] rcun ffffffff82440c00 starting wait for work. [ 283.513893] rcun ffffffff82440c00 completed wait for work. [ 283.525161] rcun ffffffff82440c00 initiating boost. [ 283.525381] rcun ffffffff82440c00 completed boost. [ 283.545051] rcun ffffffff82440c00 awaking rcuc20. [ 283.545289] rcun ffffffff82440c00 awakened rcuc20. [ 283.545497] rcun ffffffff82440c00 starting wait for work. [ 283.577780] rcun ffffffff82440c00 completed wait for work. [ 283.578067] rcun ffffffff82440c00 initiating boost. [ 283.585133] rcun ffffffff82440c00 completed boost. [ 283.585366] rcun ffffffff82440c00 awaking rcuc20. [ 283.604979] rcun ffffffff82440c00 awakened rcuc20. [ 283.605219] rcun ffffffff82440c00 starting wait for work. [ 283.673621] rcun ffffffff82440c00 completed wait for work. [ 283.673904] rcun ffffffff82440c00 initiating boost. [ 283.674126] rcun ffffffff82440c00 completed boost. [ 283.684951] rcun ffffffff82440c00 awaking rcuc20. [ 283.685186] rcun ffffffff82440c00 awakened rcuc20. [ 283.704835] rcun ffffffff82440c00 starting wait for work. [ 283.721536] rcun ffffffff82440c00 completed wait for work. [ 283.724733] rcun ffffffff82440c00 initiating boost. [ 283.724974] rcun ffffffff82440c00 completed boost. [ 283.744676] rcun ffffffff82440c00 awaking rcuc20. [ 283.744921] rcun ffffffff82440c00 awakened rcuc20. [ 283.745142] rcun ffffffff82440c00 starting wait for work. [ 283.849306] rcun ffffffff82440c00 completed wait for work. [ 283.849580] rcun ffffffff82440c00 initiating boost. [ 283.849806] rcun ffffffff82440c00 completed boost. [ 283.864625] rcun ffffffff82440c00 awaking rcuc20. [ 283.864859] rcun ffffffff82440c00 awakened rcuc20. [ 283.884509] rcun ffffffff82440c00 starting wait for work. [ 283.897233] rcun ffffffff82440c00 completed wait for work. [ 283.904500] rcun ffffffff82440c00 initiating boost. [ 283.904740] rcun ffffffff82440c00 completed boost. [ 283.924388] rcun ffffffff82440c00 awaking rcuc20. [ 283.924639] rcun ffffffff82440c00 awakened rcuc20. [ 283.924857] rcun ffffffff82440c00 starting wait for work. [ 283.961137] rcun ffffffff82440c00 completed wait for work. [ 283.961412] rcun ffffffff82440c00 initiating boost. [ 283.964375] rcun ffffffff82440c00 completed boost. [ 283.964585] rcun ffffffff82440c00 awaking rcuc20. [ 283.984347] rcun ffffffff82440c00 awakened rcuc20. [ 283.984580] rcun ffffffff82440c00 starting wait for work. [ 284.064957] rcun ffffffff82440c00 completed wait for work. [ 284.065249] rcun ffffffff82440c00 initiating boost. [ 284.065462] rcun ffffffff82440c00 completed boost. [ 284.084281] rcun ffffffff82440c00 awaking rcuc20. [ 284.084506] rcun ffffffff82440c00 awakened rcuc20. [ 284.104135] rcun ffffffff82440c00 starting wait for work. [ 284.124914] rcun ffffffff82440c00 completed wait for work. [ 284.125251] rcun ffffffff82440c00 initiating boost. [ 284.125488] rcun ffffffff82440c00 completed boost. [ 284.144394] rcun ffffffff82440c00 awaking rcuc20. [ 284.144636] rcun ffffffff82440c00 awakened rcuc20. [ 284.163985] rcun ffffffff82440c00 starting wait for work. [ 284.352449] rcun ffffffff82440c00 completed wait for work. [ 284.352722] rcun ffffffff82440c00 initiating boost. [ 284.352965] rcun ffffffff82440c00 completed boost. [ 284.363729] rcun ffffffff82440c00 awaking rcuc21. [ 284.363952] rcun ffffffff82440c00 awakened rcuc21. [ 284.383609] rcun ffffffff82440c00 starting wait for work. [ 284.400355] rcun ffffffff82440c00 completed wait for work. [ 284.403541] rcun ffffffff82440c00 initiating boost. [ 284.403752] rcun ffffffff82440c00 completed boost. [ 284.423478] rcun ffffffff82440c00 awaking rcuc21. [ 284.423730] rcun ffffffff82440c00 awakened rcuc21. [ 284.423961] rcun ffffffff82440c00 starting wait for work. [ 284.464264] rcun ffffffff82440c00 completed wait for work. [ 284.464531] rcun ffffffff82440c00 initiating boost. [ 284.464739] rcun ffffffff82440c00 completed boost. [ 284.483510] rcun ffffffff82440c00 awaking rcuc21. [ 284.483725] rcun ffffffff82440c00 awakened rcuc21. [ 284.503396] rcun ffffffff82440c00 starting wait for work. [ 284.524166] rcun ffffffff82440c00 completed wait for work. [ 284.524430] rcun ffffffff82440c00 initiating boost. [ 284.524660] rcun ffffffff82440c00 completed boost. [ 284.543410] rcun ffffffff82440c00 awaking rcuc21. [ 284.543634] rcun ffffffff82440c00 awakened rcuc21. [ 284.563295] rcun ffffffff82440c00 starting wait for work. done Loading required kernel modules done Activating device mapper... FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory failed Starting MD Raid unused Waiting for udev to settle... Scanning for LVM volume groups... File descriptor 3 left open Reading all physical volumes. This may take a while... Activating LVM volume groups... File descriptor 3 left open done Waiting for /firmware microcode . no more events Checking file systems... fsck 1.41.1 (01-Sep-2008) Checking all file systems. done done Mounting local file systems... /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /firmware on /lib/firmware type tmpfs (rw) microcode on /usr/lib/microcode type tmpfs (rw) done Activating remaining swap-devices in /etc/fstab... done Setting up linker cache (/etc/ld.so.cache) using ldconfig done Creating /var/log/boot.msg done Using boot-specified hostname 'lb-g5plus-1t-host' Setting up hostname 'lb-g5plus-1t-host' done Setting up loopback interface lo lo IP address: 127.0.0.1/8 IP address: 127.0.0.2/8 done System Boot Control: The system has been set up Skipped features: boot.md System Boot Control: Running /etc/init.d/boot.local done INIT: Entering runlevel: 3 Boot logging started on /dev/ttyS0(/dev/console) at Wed May 25 13:44:05 2011 Master Resource Control: previous runlevel: N, switching to runlevel:3 Starting D-Bus daemon done Initializing random number generator done [ 287.794818] rcun ffffffff82440f00 completed wait for work. [ 287.795218] rcun ffffffff82440f00 initiating boost. [ 287.795440] rcun ffffffff82440f00 completed boost. [ 287.807922] rcun ffffffff82440f00 awaking rcuc71. [ 287.808186] rcun ffffffff82440f00 awakened rcuc71. [ 287.808195] rcun ffffffff82440f00 starting wait for work. Starting syslog services done [ 288.693183] rcun ffffffff82440e00 completed wait for work. [ 288.693586] rcun ffffffff82440e00 initiating boost. [ 288.693810] rcun ffffffff82440e00 completed boost. [ 288.706108] rcun ffffffff82440e00 awaking rcuc50. [ 288.706352] rcun ffffffff82440e00 awakened rcuc50. [ 288.725895] rcun ffffffff82440e00 starting wait for work. [ 288.726166] rcun ffffffff82440e00 completed wait for work. [ 288.745842] rcun ffffffff82440e00 initiating boost. [ 288.746067] rcun ffffffff82440e00 completed boost. [ 288.765741] rcun ffffffff82440e00 awaking rcuc50. [ 288.765967] rcun ffffffff82440e00 awakened rcuc50. [ 288.766203] rcun ffffffff82440e00 starting wait for work. Loading CPUFreq modules done Starting HAL daemon done Setting up (localfs) network interfaces: lo lo IP address: 127.0.0.1/8 IP address: 127.0.0.2/8 done [ 289.323903] rcun ffffffff82440c00 completed wait for work. [ 289.324239] rcun ffffffff82440c00 initiating boost. [ 289.324458] rcun ffffffff82440c00 completed boost. [ 289.334891] rcun ffffffff82440c00 awaking rcuc20. [ 289.335112] rcun ffffffff82440c00 awakened rcuc20. [ 289.354785] rcun ffffffff82440c00 starting wait for work. eth0 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) No configuration found for eth0 unused eth1 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) No configuration found for eth1 unused [ 289.859076] rcun ffffffff82440c00 completed wait for work. [ 289.859428] rcun ffffffff82440c00 initiating boost. [ 289.859641] rcun ffffffff82440c00 completed boost. [ 289.873915] rcun ffffffff82440c00 awaking rcuc31. [ 289.874127] rcun ffffffff82440c00 awakened rcuc31. [ 289.893814] rcun ffffffff82440c00 starting wait for work. eth10 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth10 unused [ 290.106822] rcun ffffffff82440b00 completed wait for work. [ 290.107203] rcun ffffffff82440b00 initiating boost. [ 290.107436] rcun ffffffff82440b00 completed boost. [ 290.123462] rcun ffffffff82440b00 awaking rcuc11. [ 290.123677] rcun ffffffff82440b00 awakened rcuc11. [ 290.143342] rcun ffffffff82440b00 starting wait for work. eth11 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth11 unused [ 290.350351] rcun ffffffff82441100 completed wait for work. [ 290.350744] rcun ffffffff82441100 initiating boost. [ 290.351016] rcun ffffffff82441100 completed boost. [ 290.363076] rcun ffffffff82441100 awaking rcuc111. [ 290.363345] rcun ffffffff82441100 awakened rcuc111. [ 290.382943] rcun ffffffff82441100 starting wait for work. eth12 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) No configuration found for eth12 unused [ 290.610137] rcun ffffffff82440b00 completed wait for work. [ 290.610487] rcun ffffffff82440b00 initiating boost. [ 290.610719] rcun ffffffff82440b00 completed boost. [ 290.622562] rcun ffffffff82440b00 awaking rcuc11. [ 290.622774] rcun ffffffff82440b00 awakened rcuc11. [ 290.642453] rcun ffffffff82440b00 starting wait for work. [ 290.642720] rcun ffffffff82440b00 completed wait for work. [ 290.662423] rcun ffffffff82440b00 initiating boost. [ 290.662643] rcun ffffffff82440b00 completed boost. eth13 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) [ 290.682708] rcun ffffffff82440b00 awaking rcuc11. [ 290.702367] rcun ffffffff82440b00 awakened rcuc11. No configuration f[ 290.702694] rcun ffffffff82440b00 starting wait for work. ound for eth13 unused [ 290.865245] rcun ffffffff82440e00 completed wait for work. [ 290.865586] rcun ffffffff82440e00 initiating boost. [ 290.865802] rcun ffffffff82440e00 completed boost. [ 290.882235] rcun ffffffff82440e00 awaking rcuc51. [ 290.882530] rcun ffffffff82440e00 awakened rcuc51. [ 290.902004] rcun ffffffff82440e00 starting wait for work. eth14 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth14 unused [ 291.128769] rcun ffffffff82440c00 completed wait for work. [ 291.129110] rcun ffffffff82440c00 initiating boost. [ 291.129333] rcun ffffffff82440c00 completed boost. [ 291.141675] rcun ffffffff82440c00 awaking rcuc31. [ 291.141904] rcun ffffffff82440c00 awakened rcuc31. [ 291.161527] rcun ffffffff82440c00 starting wait for work. eth15 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth15 unused [ 291.380340] rcun ffffffff82440c00 completed wait for work. [ 291.380664] rcun ffffffff82440c00 initiating boost. [ 291.380883] rcun ffffffff82440c00 completed boost. [ 291.391214] rcun ffffffff82440c00 awaking rcuc21. [ 291.391431] rcun ffffffff82440c00 awakened rcuc21. [ 291.411105] rcun ffffffff82440c00 starting wait for work. [ 291.432271] rcun ffffffff82440c00 completed wait for work. [ 291.432551] rcun ffffffff82440c00 initiating boost. [ 291.432785] rcun ffffffff82440c00 completed boost. [ 291.451161] rcun ffffffff82440c00 awaking rcuc21. [ 291.451167] rcun ffffffff82440c00 awakened rcuc21. [ 291.451170] rcun ffffffff82440c00 starting wait for work. eth16 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth16 unused [ 291.652073] rcun ffffffff82440b00 completed wait for work. [ 291.652448] rcun ffffffff82440b00 initiating boost. [ 291.652741] rcun ffffffff82440b00 completed boost. [ 291.670699] rcun ffffffff82440b00 awaking rcuc1. [ 291.670945] rcun ffffffff82440b00 awakened rcuc1. [ 291.690597] rcun ffffffff82440b00 starting wait for work. eth17 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth17 unused [ 291.931570] rcun ffffffff82441000 starting wait for work. [ 291.931899] rcun ffffffff82441000 completed wait for work. [ 291.932142] rcun ffffffff82441000 initiating boost. [ 291.950219] rcun ffffffff82441000 completed boost. [ 291.950445] rcun ffffffff82441000 awaking rcuc90. [ 291.970127] rcun ffffffff82441000 awakened rcuc90. [ 291.970344] rcun ffffffff82441000 starting wait for work. [ 291.990092] rcun ffffffff82441000 completed wait for work. [ 291.990101] rcun ffffffff82441000 initiating boost. [ 291.990105] rcun ffffffff82441000 completed boost. [ 291.990109] rcun ffffffff82441000 awaking rcuc90. [ 291.990119] rcun ffffffff82441000 awakened rcuc90. [ 291.990123] rcun ffffffff82441000 starting wait for work. eth18 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) No configuration found for eth18 unused eth19 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) No configuration found for eth19 unused [ 292.470547] rcun ffffffff82440e00 completed wait for work. [ 292.470936] rcun ffffffff82440e00 initiating boost. [ 292.471191] rcun ffffffff82440e00 completed boost. [ 292.489291] rcun ffffffff82440e00 awaking rcuc51. [ 292.489550] rcun ffffffff82440e00 awakened rcuc51. [ 292.509226] rcun ffffffff82440e00 starting wait for work. [ 292.509478] rcun ffffffff82440e00 completed wait for work. [ 292.529136] rcun ffffffff82440e00 initiating boost. [ 292.529141] rcun ffffffff82440e00 completed boost. [ 292.529144] rcun ffffffff82440e00 awaking rcuc51. [ 292.529151] rcun ffffffff82440e00 awakened rcuc51. [ 292.529155] rcun ffffffff82440e00 starting wait for work. eth2 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) No configuration found for eth2 unused [ 292.753993] rcun ffffffff82440c00 completed wait for work. [ 292.754348] rcun ffffffff82440c00 initiating boost. [ 292.754596] rcun ffffffff82440c00 completed boost. [ 292.768779] rcun ffffffff82440c00 awaking rcuc31. [ 292.769018] rcun ffffffff82440c00 awakened rcuc31. [ 292.788632] rcun ffffffff82440c00 starting wait for work. eth20 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth20 unused [ 292.985585] rcun ffffffff82440c00 completed wait for work. [ 292.985925] rcun ffffffff82440c00 initiating boost. [ 292.986150] rcun ffffffff82440c00 completed boost. [ 292.998422] rcun ffffffff82440c00 awaking rcuc31. [ 292.998647] rcun ffffffff82440c00 awakened rcuc31. [ 293.018318] rcun ffffffff82440c00 starting wait for work. eth21 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth21 unused [ 293.289596] rcun ffffffff82440b00 completed wait for work. [ 293.289987] rcun ffffffff82440b00 initiating boost. [ 293.290245] rcun ffffffff82440b00 completed boost. [ 293.307893] rcun ffffffff82440b00 awaking rcuc1. [ 293.308159] rcun ffffffff82440b00 awakened rcuc1. [ 293.327728] rcun ffffffff82440b00 starting wait for work. [ 293.337111] rcun ffffffff82440b00 completed wait for work. [ 293.347737] rcun ffffffff82440b00 initiating boost. [ 293.347958] rcun ffffffff82440b00 completed boost. eth22 de[ 293.367776] rcun ffffffff82440b00 awaking rcuc1. vice: Intel Corporation 82599EB [ 293.368186] rcun ffffffff82440b00 awakened rcuc1. 10 Gigabit Netwo[ 293.368191] rcun ffffffff82440b00 starting wait for work. rk Connection (rev 01) No configuration found for eth22 unused eth23 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) No configuration found for eth23 unused eth24 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth24 unused [ 294.075710] rcun ffffffff82440e00 completed wait for work. [ 294.076063] rcun ffffffff82440e00 initiating boost. [ 294.076287] rcun ffffffff82440e00 completed boost. [ 294.086523] rcun ffffffff82440e00 awaking rcuc51. [ 294.086800] rcun ffffffff82440e00 awakened rcuc51. [ 294.106385] rcun ffffffff82440e00 starting wait for work. [ 294.135592] rcun ffffffff82440e00 completed wait for work. [ 294.135876] rcun ffffffff82440e00 initiating boost. [ 294.136103] rcun ffffffff82440e00 completed boost. [ 294.146373] rcun ffffffff82440e00 awaking rcuc51. [ 294.146643] rcun ffffffff82440e00 awakened rcuc51. [ 294.166295] rcun ffffffff82440e00 starting wait for work. eth25 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) No configuration found for eth25 unused [ 294.383392] rcun ffffffff82440e00 completed wait for work. [ 294.383728] rcun ffffffff82440e00 initiating boost. [ 294.383945] rcun ffffffff82440e00 completed boost. [ 294.395907] rcun ffffffff82440e00 awaking rcuc51. [ 294.396134] rcun ffffffff82440e00 awakened rcuc51. [ 294.415794] rcun ffffffff82440e00 starting wait for work. eth26 device: QLogic Corp. 10GbE Converged Network Adapter (TCP/IP Networking) (rev 02) No configuration found for eth26 unused [ 294.658799] rcun ffffffff82440c00 completed wait for work. [ 294.659130] rcun ffffffff82440c00 initiating boost. [ 294.659341] rcun ffffffff82440c00 completed boost. [ 294.675419] rcun ffffffff82440c00 awaking rcuc31. [ 294.675645] rcun ffffffff82440c00 awakened rcuc31. [ 294.695468] rcun ffffffff82440c00 starting wait for work. eth27 device: QLogic Corp. 10GbE Converged Network Adapter (TCP/IP Networking) (rev 02) No configuration found for eth27 unused [ 294.918239] rcun ffffffff82440b00 completed wait for work. [ 294.918602] rcun ffffffff82440b00 initiating boost. [ 294.918834] rcun ffffffff82440b00 completed boost. [ 294.934983] rcun ffffffff82440b00 awaking rcuc0. [ 294.935245] rcun ffffffff82440b00 awakened rcuc0. [ 294.954854] rcun ffffffff82440b00 starting wait for work. eth3 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) No configuration found for eth3 unused [ 295.233919] rcun ffffffff82440d00 starting wait for work. [ 295.234268] rcun ffffffff82440d00 completed wait for work. [ 295.234550] rcun ffffffff82440d00 initiating boost. [ 295.244455] rcun ffffffff82440d00 completed boost. [ 295.244686] rcun ffffffff82440d00 awaking rcuc40. eth4 de[ 295.264423] rcun ffffffff82440d00 awakened rcuc40. vice: Intel Corporation 82576 Gi[ 295.264682] rcun ffffffff82440d00 starting wait for work. gabit Network Connection (rev 01) No configuration found for eth4 unused [ 295.477285] rcun ffffffff82440f00 completed wait for work. [ 295.477635] rcun ffffffff82440f00 initiating boost. [ 295.477861] rcun ffffffff82440f00 completed boost. [ 295.494030] rcun ffffffff82440f00 awaking rcuc70. [ 295.494278] rcun ffffffff82440f00 awakened rcuc70. [ 295.513989] rcun ffffffff82440f00 starting wait for work. eth5 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth5 unused [ 295.720833] rcun ffffffff82440d00 completed wait for work. [ 295.721156] rcun ffffffff82440d00 initiating boost. [ 295.721376] rcun ffffffff82440d00 completed boost. [ 295.733558] rcun ffffffff82440d00 awaking rcuc41. [ 295.733799] rcun ffffffff82440d00 awakened rcuc41. [ 295.753424] rcun ffffffff82440d00 starting wait for work. eth6 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth6 unused [ 296.000356] rcun ffffffff82440c00 completed wait for work. [ 296.000716] rcun ffffffff82440c00 initiating boost. [ 296.000942] rcun ffffffff82440c00 completed boost. [ 296.013086] rcun ffffffff82440c00 awaking rcuc31. [ 296.013305] rcun ffffffff82440c00 awakened rcuc31. [ 296.032935] rcun ffffffff82440c00 starting wait for work. eth7 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth7 unused [ 296.220235] rcun ffffffff82440e00 completed wait for work. [ 296.220570] rcun ffffffff82440e00 initiating boost. [ 296.220796] rcun ffffffff82440e00 completed boost. [ 296.232694] rcun ffffffff82440e00 awaking rcuc51. [ 296.232934] rcun ffffffff82440e00 awakened rcuc51. [ 296.252558] rcun ffffffff82440e00 starting wait for work. [ 296.252841] rcun ffffffff82440e00 completed wait for work. [ 296.272515] rcun ffffffff82440e00 initiating boost. [ 296.272784] rcun ffffffff82440e00 completed boost. [ 296.292387] rcun ffffffff82440e00 awaking rcuc51. [ 296.292647] rcun ffffffff82440e00 awakened rcuc51. [ 296.292920] rcun ffffffff82440e00 starting wait for work. eth8 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth8 unused [ 296.483877] rcun ffffffff82440b00 completed wait for work. [ 296.484237] rcun ffffffff82440b00 initiating boost. [ 296.484456] rcun ffffffff82440b00 completed boost. [ 296.502204] rcun ffffffff82440b00 awaking rcuc1. [ 296.502491] rcun ffffffff82440b00 awakened rcuc1. [ 296.522201] rcun ffffffff82440b00 starting wait for work. [ 296.522459] rcun ffffffff82440b00 completed wait for work. [ 296.542024] rcun ffffffff82440b00 initiating boost. [ 296.542317] rcun ffffffff82440b00 completed boost. [ 296.561935] rcun ffffffff82440b00 awaking rcuc1. [ 296.562213] rcun ffffffff82440b00 awakened rcuc1. [ 296.562431] rcun ffffffff82440b00 starting wait for work. eth9 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) No configuration found for eth9 unused ib0 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) No configuration found for ib0 unused [ 297.014805] rcun ffffffff82440b00 completed wait for work. [ 297.015156] rcun ffffffff82440b00 initiating boost. [ 297.015406] rcun ffffffff82440b00 completed boost. [ 297.031238] rcun ffffffff82440b00 awaking rcuc1. [ 297.031490] rcun ffffffff82440b00 awakened rcuc1. [ 297.051130] rcun ffffffff82440b00 starting wait for work. ib1 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) No configuration found for ib1 unused Setting up service (localfs) network . . . . . . . . . . done Starting RPC portmap daemon done Setting up (remotefs) network interfaces: Setting up service (remotefs) network . . . . . . . . . . done Master Resource Control: runlevel 3 has been reached [ 298.843747] rcun ffffffff82440e00 completed wait for work. [ 298.844125] rcun ffffffff82440e00 initiating boost. [ 298.844380] rcun ffffffff82440e00 completed boost. [ 298.858327] rcun ffffffff82440e00 awaking rcuc52. [ 298.858648] rcun ffffffff82440e00 awakened rcuc52. [ 298.878014] rcun ffffffff82440e00 starting wait for work. [ 298.887761] rcun ffffffff82440c00 completed wait for work. [ 298.898179] rcun ffffffff82440c00 initiating boost. [ 298.898480] rcun ffffffff82440c00 completed boost. [ 298.917825] rcun ffffffff82440c00 awaking rcuc30. [ 298.918148] rcun ffffffff82440c00 awakened rcuc30. [ 298.918399] rcun ffffffff82440c00 starting wait for work. [ 299.051210] rcun ffffffff82441300 completed wait for work. [ 299.051271] rcun ffffffff82440d00 completed wait for work. [ 299.051279] rcun ffffffff82440d00 initiating boost. [ 299.051283] rcun ffffffff82440d00 completed boost. [ 299.051288] rcun ffffffff82440d00 awaking rcuc40. [ 299.051300] rcun ffffffff82440c00 completed wait for work. [ 299.051314] rcun ffffffff82440c00 initiating boost. [ 299.051323] rcun ffffffff82440c00 completed boost. [ 299.051328] rcun ffffffff82440c00 awaking rcuc31. [ 299.051356] rcun ffffffff82440d00 awakened rcuc40. [ 299.051360] rcun ffffffff82440d00 starting wait for work. [ 299.051383] rcun ffffffff82440c00 awakened rcuc31. [ 299.051388] rcun ffffffff82440c00 starting wait for work. [ 299.148138] rcun ffffffff82441300 initiating boost. [ 299.167706] rcun ffffffff82441300 completed boost. [ 299.167993] rcun ffffffff82441300 awaking rcuc132. [ 299.187516] rcun ffffffff82441300 awakened rcuc132. [ 299.187841] rcun ffffffff82441300 starting wait for work. [ 299.223164] rcun ffffffff82441300 completed wait for work. [ 299.223485] rcun ffffffff82441300 initiating boost. [ 299.227367] rcun ffffffff82441300 completed boost. [ 299.227654] rcun ffffffff82441300 awaking rcuc132. [ 299.227969] rcun ffffffff82441300 awakened rcuc132. [ 299.247460] rcun ffffffff82441300 starting wait for work. [ 299.278920] rcun ffffffff82440d00 completed wait for work. [ 299.279251] rcun ffffffff82440d00 initiating boost. [ 299.279497] rcun ffffffff82440d00 completed boost. [ 299.297867] rcun ffffffff82440d00 awaking rcuc40. [ 299.298180] rcun ffffffff82440d00 awakened rcuc40. [ 299.317166] rcun ffffffff82440d00 starting wait for work. [ 299.334809] rcun ffffffff82441300 completed wait for work. [ 299.336979] rcun ffffffff82441300 initiating boost. [ 299.337250] rcun ffffffff82441300 completed boost. [ 299.338720] rcun ffffffff82441100 completed wait for work. [ 299.338737] rcun ffffffff82441100 initiating boost. [ 299.338745] rcun ffffffff82441100 completed boost. [ 299.338752] rcun ffffffff82441100 awaking rcuc102. [ 299.338803] rcun ffffffff82441100 awakened rcuc102. [ 299.338809] rcun ffffffff82441100 starting wait for work. [ 299.342935] rcun ffffffff82440f00 completed wait for work. [ 299.342964] rcun ffffffff82440f00 initiating boost. [ 299.342991] rcun ffffffff82440f00 completed boost. [ 299.343009] rcun ffffffff82440f00 awaking rcuc70. [ 299.343064] rcun ffffffff82440f00 awakened rcuc70. [ 299.343077] rcun ffffffff82440f00 starting wait for work. [ 299.350627] rcun ffffffff82441100 completed wait for work. [ 299.350634] rcun ffffffff82441100 initiating boost. [ 299.350638] rcun ffffffff82441100 completed boost. [ 299.350641] rcun ffffffff82441100 awaking rcuc102. [ 299.350652] rcun ffffffff82441100 awakened rcuc102. [ 299.350655] rcun ffffffff82441100 starting wait for work. [ 299.350689] rcun ffffffff82440f00 completed wait for work. [ 299.350701] rcun ffffffff82440f00 initiating boost. [ 299.350708] rcun ffffffff82440f00 completed boost. [ 299.350716] rcun ffffffff82440f00 awaking rcuc70. [ 299.350807] rcun ffffffff82440f00 awakened rcuc70. [ 299.350815] rcun ffffffff82440f00 starting wait for work. [ 299.557141] rcun ffffffff82441300 awaking rcuc132. [ 299.557143] rcun ffffffff82441300 awakened rcuc132. [ 299.557145] rcun ffffffff82441300 starting wait for work. lb-g5plus-1t-host login: lb-g5plus-1t-host login: [ 379.823934] INFO: task rcun0:8 blocked for more than 120 seconds. [ 379.824295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 379.843811] rcun0 D 0000000000000000 0 8 2 0x00000000 [ 379.844152] ffff882070d47e90 0000000000000046 ffff882070d46000 0000000000004000 [ 379.844178] 00000000001d1f40 ffff882070d47fd8 00000000001d1f40 ffff882070d47fd8 [ 379.844204] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d4a2b0 [ 379.844232] Call Trace: [ 379.844255] [<ffffffff810afa0a>] ? __lock_release+0x166/0x16f [ 379.844273] [<ffffffff81c21de9>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 379.844287] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 [ 379.844298] [<ffffffff810adcde>] ? trace_hardirqs_on+0xd/0xf [ 379.844309] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 [ 379.844325] [<ffffffff81099e3d>] kthread+0x8c/0xa8 [ 379.844342] [<ffffffff81c29ad4>] kernel_thread_helper+0x4/0x10 [ 379.844353] [<ffffffff81c22080>] ? retint_restore_args+0xe/0xe [ 379.844364] [<ffffffff81099db1>] ? __init_kthread_worker+0x5b/0x5b [ 379.844375] [<ffffffff81c29ad0>] ? gs_change+0xb/0xb [ 379.844379] INFO: lockdep is turned off. [ 379.844595] INFO: task rcun8:576 blocked for more than 120 seconds. [ 379.844598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 379.844602] rcun8 D 0000000000000000 0 576 2 0x00000000 [ 379.844608] ffff881fff98de90 0000000000000046 ffff881fff98c000 0000000000004000 [ 379.844634] 00000000001d1f40 ffff881fff98dfd8 00000000001d1f40 ffff881fff98dfd8 [ 379.844658] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff881fff9822b0 [ 379.844683] Call Trace: [ 379.844694] [<ffffffff810afa0a>] ? __lock_release+0x166/0x16f [ 379.844705] [<ffffffff81c21de9>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 379.844715] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 [ 379.844726] [<ffffffff810adcde>] ? trace_hardirqs_on+0xd/0xf [ 379.844736] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 [ 379.844747] [<ffffffff81099e3d>] kthread+0x8c/0xa8 [ 379.844759] [<ffffffff81c29ad4>] kernel_thread_helper+0x4/0x10 [ 379.844769] [<ffffffff81c22080>] ? retint_restore_args+0xe/0xe [ 379.844781] [<ffffffff81099db1>] ? __init_kthread_worker+0x5b/0x5b [ 379.844791] [<ffffffff81c29ad0>] ? gs_change+0xb/0xb [ 379.844794] INFO: lockdep is turned off. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 22:01 ` Yinghai Lu @ 2011-05-23 22:55 ` Yinghai Lu 2011-05-23 22:58 ` Yinghai Lu 2011-05-24 1:12 ` Paul E. McKenney 1 sibling, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-23 22:55 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/23/2011 03:01 PM, Yinghai Lu wrote: > On 05/23/2011 02:25 PM, Paul E. McKenney wrote: >> On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: >>> On 05/21/2011 07:08 AM, Paul E. McKenney wrote: >>>> On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: >>>>> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: >>>>>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: >>>>>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: >>>>>> ... >>>>>>>> >>>>>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. >>>>>>> >>>>>>> OK, just to make sure I understand... You are compiling exactly the >>>>>>> same kernel source tree with exactly the same .config, just with two >>>>>>> different versions of gcc, correct? >>>>>> yes. >>>>>>> >>>>>>> If so, it is quite possible that the slow one is the correct one. :-/ >>>>>> yeah, new version always have problem. >>>>>> >>>>>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 >>>>> >>>>> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow >>>>> one (4.5.0), correct? >>>> >>>> And does commit c7a3786030 help? This commit (from Peter Zijlstra) >>>> tidied up RCU kthreads' scheduler interactions. The patch is below, >>>> though it is probably more convenient to pull it from the rcu/next >>>> branch of: >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git >>>> >> gcc in Fedora 14 is fine with your tree. Looks like I need to dump openuse 11.3 now. Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 22:55 ` Yinghai Lu @ 2011-05-23 22:58 ` Yinghai Lu 2011-05-24 1:18 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-23 22:58 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/23/2011 03:55 PM, Yinghai Lu wrote: > On 05/23/2011 03:01 PM, Yinghai Lu wrote: >> On 05/23/2011 02:25 PM, Paul E. McKenney wrote: >>> On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: >>>> On 05/21/2011 07:08 AM, Paul E. McKenney wrote: >>>>> On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: >>>>>> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: >>>>>>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: >>>>>>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: >>>>>>> ... >>>>>>>>> >>>>>>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. >>>>>>>> >>>>>>>> OK, just to make sure I understand... You are compiling exactly the >>>>>>>> same kernel source tree with exactly the same .config, just with two >>>>>>>> different versions of gcc, correct? >>>>>>> yes. >>>>>>>> >>>>>>>> If so, it is quite possible that the slow one is the correct one. :-/ >>>>>>> yeah, new version always have problem. >>>>>>> >>>>>>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 >>>>>> >>>>>> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow >>>>>> one (4.5.0), correct? >>>>> >>>>> And does commit c7a3786030 help? This commit (from Peter Zijlstra) >>>>> tidied up RCU kthreads' scheduler interactions. The patch is below, >>>>> though it is probably more convenient to pull it from the rcu/next >>>>> branch of: >>>>> >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git >>>>> >>> > gcc in Fedora 14 is fine with your tree. > sorry, I should wait for longer to see Fedora 14 is ok. got same warning with the one compiled from fedora 14... [ 372.937251] INFO: task rcun0:8 blocked for more than 120 seconds. [ 372.937618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 372.956130] rcun0 D 0000000000000000 0 8 2 0x00000000 [ 372.956498] ffff882070d65e90 0000000000000046 ffff882070d64000 0000000000004000 [ 372.956528] 00000000001d1f40 ffff882070d65fd8 00000000001d1f40 ffff882070d65fd8 [ 372.956555] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d6a2b0 [ 372.956581] Call Trace: [ 372.956605] [<ffffffff810afce3>] ? __lock_release+0x166/0x16f [ 372.956624] [<ffffffff81c229d1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 372.956639] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 372.956650] [<ffffffff810adfd5>] ? trace_hardirqs_on+0xd/0xf [ 372.956661] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 372.956673] [<ffffffff8109a0a5>] kthread+0x8c/0xa8 [ 372.956689] [<ffffffff81c2a754>] kernel_thread_helper+0x4/0x10 [ 372.956701] [<ffffffff81c22c80>] ? retint_restore_args+0xe/0xe [ 372.956711] [<ffffffff8109a019>] ? __init_kthread_worker+0x5b/0x5b [ 372.956722] [<ffffffff81c2a750>] ? gs_change+0xb/0xb [ 372.956726] INFO: lockdep is turned off. [ 492.750827] INFO: task rcun0:8 blocked for more than 120 seconds. [ 492.751150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.762991] rcun0 D 0000000000000000 0 8 2 0x00000000 [ 492.763264] ffff882070d65e90 0000000000000046 ffff882070d64000 0000000000004000 [ 492.763294] 00000000001d1f40 ffff882070d65fd8 00000000001d1f40 ffff882070d65fd8 [ 492.763320] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d6a2b0 [ 492.763346] Call Trace: [ 492.763359] [<ffffffff810afce3>] ? __lock_release+0x166/0x16f [ 492.763371] [<ffffffff81c229d1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 [ 492.763382] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 492.763393] [<ffffffff810adfd5>] ? trace_hardirqs_on+0xd/0xf [ 492.763404] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 [ 492.763414] [<ffffffff8109a0a5>] kthread+0x8c/0xa8 [ 492.763427] [<ffffffff81c2a754>] kernel_thread_helper+0x4/0x10 [ 492.763439] [<ffffffff81c22c80>] ? retint_restore_args+0xe/0xe [ 492.763449] [<ffffffff8109a019>] ? __init_kthread_worker+0x5b/0x5b [ 492.763460] [<ffffffff81c2a750>] ? gs_change+0xb/0xb [ 492.763463] INFO: lockdep is turned off. if reverting PeterZ's patch will not have that warning. Thanks ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 22:58 ` Yinghai Lu @ 2011-05-24 1:18 ` Paul E. McKenney 2011-05-24 1:26 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-24 1:18 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Mon, May 23, 2011 at 03:58:45PM -0700, Yinghai Lu wrote: > On 05/23/2011 03:55 PM, Yinghai Lu wrote: > > On 05/23/2011 03:01 PM, Yinghai Lu wrote: > >> On 05/23/2011 02:25 PM, Paul E. McKenney wrote: > >>> On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: > >>>> On 05/21/2011 07:08 AM, Paul E. McKenney wrote: > >>>>> On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: > >>>>>> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: > >>>>>>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > >>>>>>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > >>>>>>> ... > >>>>>>>>> > >>>>>>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > >>>>>>>> > >>>>>>>> OK, just to make sure I understand... You are compiling exactly the > >>>>>>>> same kernel source tree with exactly the same .config, just with two > >>>>>>>> different versions of gcc, correct? > >>>>>>> yes. > >>>>>>>> > >>>>>>>> If so, it is quite possible that the slow one is the correct one. :-/ > >>>>>>> yeah, new version always have problem. > >>>>>>> > >>>>>>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 > >>>>>> > >>>>>> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow > >>>>>> one (4.5.0), correct? > >>>>> > >>>>> And does commit c7a3786030 help? This commit (from Peter Zijlstra) > >>>>> tidied up RCU kthreads' scheduler interactions. The patch is below, > >>>>> though it is probably more convenient to pull it from the rcu/next > >>>>> branch of: > >>>>> > >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > >>>>> > >>> > > gcc in Fedora 14 is fine with your tree. > > > > sorry, I should wait for longer to see Fedora 14 is ok. > > got same warning with the one compiled from fedora 14... > > [ 372.937251] INFO: task rcun0:8 blocked for more than 120 seconds. > [ 372.937618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 372.956130] rcun0 D 0000000000000000 0 8 2 0x00000000 > [ 372.956498] ffff882070d65e90 0000000000000046 ffff882070d64000 0000000000004000 > [ 372.956528] 00000000001d1f40 ffff882070d65fd8 00000000001d1f40 ffff882070d65fd8 > [ 372.956555] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d6a2b0 > [ 372.956581] Call Trace: > [ 372.956605] [<ffffffff810afce3>] ? __lock_release+0x166/0x16f > [ 372.956624] [<ffffffff81c229d1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > [ 372.956639] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 372.956650] [<ffffffff810adfd5>] ? trace_hardirqs_on+0xd/0xf > [ 372.956661] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 372.956673] [<ffffffff8109a0a5>] kthread+0x8c/0xa8 > [ 372.956689] [<ffffffff81c2a754>] kernel_thread_helper+0x4/0x10 > [ 372.956701] [<ffffffff81c22c80>] ? retint_restore_args+0xe/0xe > [ 372.956711] [<ffffffff8109a019>] ? __init_kthread_worker+0x5b/0x5b > [ 372.956722] [<ffffffff81c2a750>] ? gs_change+0xb/0xb > [ 372.956726] INFO: lockdep is turned off. > [ 492.750827] INFO: task rcun0:8 blocked for more than 120 seconds. > [ 492.751150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 492.762991] rcun0 D 0000000000000000 0 8 2 0x00000000 > [ 492.763264] ffff882070d65e90 0000000000000046 ffff882070d64000 0000000000004000 > [ 492.763294] 00000000001d1f40 ffff882070d65fd8 00000000001d1f40 ffff882070d65fd8 > [ 492.763320] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d6a2b0 > [ 492.763346] Call Trace: > [ 492.763359] [<ffffffff810afce3>] ? __lock_release+0x166/0x16f > [ 492.763371] [<ffffffff81c229d1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > [ 492.763382] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 492.763393] [<ffffffff810adfd5>] ? trace_hardirqs_on+0xd/0xf > [ 492.763404] [<ffffffff810ce941>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > [ 492.763414] [<ffffffff8109a0a5>] kthread+0x8c/0xa8 > [ 492.763427] [<ffffffff81c2a754>] kernel_thread_helper+0x4/0x10 > [ 492.763439] [<ffffffff81c22c80>] ? retint_restore_args+0xe/0xe > [ 492.763449] [<ffffffff8109a019>] ? __init_kthread_worker+0x5b/0x5b > [ 492.763460] [<ffffffff81c2a750>] ? gs_change+0xb/0xb > [ 492.763463] INFO: lockdep is turned off. > > if reverting PeterZ's patch will not have that warning. OK, so it looks like I need to get this out of the way in order to track down the delays. Or does reverting PeterZ's patch get you a stable system, but with the longish delays in memory_dev_init()? If the latter, it might be more productive to handle the two problems separately. For whatever it is worth, I do see about 5% increase in grace-period duration when switching to kthreads. This is acceptable -- your 30x increase clearly is completely unacceptable and must be fixed. Other than that, the main thing that affects grace period duration is the setting of CONFIG_HZ -- the smaller the HZ value, the longer the grace-period duration. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-24 1:18 ` Paul E. McKenney @ 2011-05-24 1:26 ` Yinghai Lu 2011-05-24 1:35 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-24 1:26 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > OK, so it looks like I need to get this out of the way in order to track > down the delays. Or does reverting PeterZ's patch get you a stable > system, but with the longish delays in memory_dev_init()? If the latter, > it might be more productive to handle the two problems separately. > > For whatever it is worth, I do see about 5% increase in grace-period > duration when switching to kthreads. This is acceptable -- your > 30x increase clearly is completely unacceptable and must be fixed. > Other than that, the main thing that affects grace period duration is > the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > grace-period duration. for my 1024g system when memory hotadd is enabled in kernel config: 1. current linus tree + tip tree: memory_dev_init will take about 100s. 2. current linus tree + tip tree + your tree - Peterz patch: a. on fedora 14 gcc: will cost about 4s: like old times b. on opensuse 11.3 gcc: will cost about 10s. Thanks Yinghai Lu ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-24 1:26 ` Yinghai Lu @ 2011-05-24 1:35 ` Paul E. McKenney 2011-05-24 21:23 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-24 1:35 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > > > OK, so it looks like I need to get this out of the way in order to track > > down the delays. Or does reverting PeterZ's patch get you a stable > > system, but with the longish delays in memory_dev_init()? If the latter, > > it might be more productive to handle the two problems separately. > > > > For whatever it is worth, I do see about 5% increase in grace-period > > duration when switching to kthreads. This is acceptable -- your > > 30x increase clearly is completely unacceptable and must be fixed. > > Other than that, the main thing that affects grace period duration is > > the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > > grace-period duration. > > for my 1024g system when memory hotadd is enabled in kernel config: > 1. current linus tree + tip tree: memory_dev_init will take about 100s. > 2. current linus tree + tip tree + your tree - Peterz patch: > a. on fedora 14 gcc: will cost about 4s: like old times > b. on opensuse 11.3 gcc: will cost about 10s. So some patch in my tree that is not yet in tip makes things better? If so, could you please see which one? Maybe that would give me a hint that could make things better on opensuse 11.3 as well. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-24 1:35 ` Paul E. McKenney @ 2011-05-24 21:23 ` Yinghai Lu 2011-05-25 0:05 ` Paul E. McKenney 2011-05-25 0:10 ` Yinghai Lu 0 siblings, 2 replies; 45+ messages in thread From: Yinghai Lu @ 2011-05-24 21:23 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: >> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: >> >>> OK, so it looks like I need to get this out of the way in order to track >>> down the delays. Or does reverting PeterZ's patch get you a stable >>> system, but with the longish delays in memory_dev_init()? If the latter, >>> it might be more productive to handle the two problems separately. >>> >>> For whatever it is worth, I do see about 5% increase in grace-period >>> duration when switching to kthreads. This is acceptable -- your >>> 30x increase clearly is completely unacceptable and must be fixed. >>> Other than that, the main thing that affects grace period duration is >>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the >>> grace-period duration. >> >> for my 1024g system when memory hotadd is enabled in kernel config: >> 1. current linus tree + tip tree: memory_dev_init will take about 100s. >> 2. current linus tree + tip tree + your tree - Peterz patch: >> a. on fedora 14 gcc: will cost about 4s: like old times >> b. on opensuse 11.3 gcc: will cost about 10s. > > So some patch in my tree that is not yet in tip makes things better? > > If so, could you please see which one? Maybe that would give me a hint > that could make things better on opensuse 11.3 as well. today's tip: [ 31.795597] cpu_dev_init done [ 40.930202] memory_dev_init done after commit e219b351fc90c0f5304e16efbc603b3b78843ea1 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Mon May 16 02:44:06 2011 -0700 rcu: Remove old memory barriers from rcu_process_callbacks() Second step of partitioning of commit e59fb3120b. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 3731141..011bf6f 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1460,25 +1460,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) */ static void rcu_process_callbacks(void) { - /* - * Memory references from any prior RCU read-side critical sections - * executed by the interrupted code must be seen before any RCU - * grace-period manipulations below. - */ - smp_mb(); /* See above block comment. */ - __rcu_process_callbacks(&rcu_sched_state, &__get_cpu_var(rcu_sched_data)); __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); rcu_preempt_process_callbacks(); - /* - * Memory references from any later RCU read-side critical sections - * executed by the interrupted code must be seen after any RCU - * grace-period manipulations above. - */ - smp_mb(); /* See above block comment. */ - /* If we are last CPU on way to dyntick-idle mode, accelerate it. */ rcu_needs_cpu_flush(); } cause [ 32.235103] cpu_dev_init done [ 74.897943] memory_dev_init done then add commit d0d642680d4cf5cc2ccf542b74a3c8b7e197306b Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Mon May 16 02:52:04 2011 -0700 rcu: Don't do reschedule unless in irq Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This should be a no-op, because rcu_irq_exit() should only be called from irq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 011bf6f..195b3a3 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -421,8 +421,9 @@ void rcu_irq_exit(void) WARN_ON_ONCE(rdtp->dynticks & 0x1); /* If the interrupt queued a callback, get out of dyntick mode. */ - if (__this_cpu_read(rcu_sched_data.nxtlist) || - __this_cpu_read(rcu_bh_data.nxtlist)) + if (in_irq() && + (__this_cpu_read(rcu_sched_data.nxtlist) || + __this_cpu_read(rcu_bh_data.nxtlist))) set_need_resched(); } got: [ 34.384490] cpu_dev_init done [ 86.656322] memory_dev_init done after commit fcfc28801f5b3b9c70616fc57e3a2c6f52014e14 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Mon May 16 14:27:31 2011 -0700 rcu: Make rcu_enter_nohz() pay attention to nesting The old version of rcu_enter_nohz() forced RCU into nohz mode even if the nesting count was non-zero. This change causes rcu_enter_nohz() to hold off for non-zero nesting counts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 195b3a3..99c6038 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -324,8 +324,8 @@ void rcu_enter_nohz(void) smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); - rdtp->dynticks++; - rdtp->dynticks_nesting--; + if (--rdtp->dynticks_nesting == 0) + rdtp->dynticks++; WARN_ON_ONCE(rdtp->dynticks & 0x1); local_irq_restore(flags); } got: [ 32.414049] cpu_dev_init done [ 38.237979] memory_dev_init done after: commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Tue Sep 7 10:38:22 2010 -0700 rcu: Decrease memory-barrier usage based on semi-formal proof ... got: [ 32.447936] cpu_dev_init done [ 111.027066] memory_dev_init done after commit fbb753fb9dd62318d27fa070c686423ced139817 Author: Paul E. McKenney <paul.mckenney@linaro.org> Date: Wed May 11 05:33:33 2011 -0700 atomic: Add atomic_or() An atomic_or() function is needed by TREE_RCU to avoid deadlock, so add a generic version. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/include/linux/atomic.h b/include/linux/atomic.h index 96c038e..ee456c7 100644 --- a/include/linux/atomic.h +++ b/include/linux/atomic.h @@ -34,4 +34,17 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint) } #endif +#ifndef CONFIG_ARCH_HAS_ATOMIC_OR +static inline void atomic_or(int i, atomic_t *v) +{ + int old; + int new; + + do { + old = atomic_read(v); + new = old | i; + } while (atomic_cmpxchg(v, old, new) != old); +} +#endif /* #ifndef CONFIG_ARCH_HAS_ATOMIC_OR */ + #endif /* _LINUX_ATOMIC_H */ got: [ 32.803704] cpu_dev_init done [ 99.171292] memory_dev_init done ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-24 21:23 ` Yinghai Lu @ 2011-05-25 0:05 ` Paul E. McKenney 2011-05-25 0:13 ` Yinghai Lu 2011-05-25 0:16 ` Paul E. McKenney 2011-05-25 0:10 ` Yinghai Lu 1 sibling, 2 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 0:05 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > > On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > >> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > >> > >>> OK, so it looks like I need to get this out of the way in order to track > >>> down the delays. Or does reverting PeterZ's patch get you a stable > >>> system, but with the longish delays in memory_dev_init()? If the latter, > >>> it might be more productive to handle the two problems separately. > >>> > >>> For whatever it is worth, I do see about 5% increase in grace-period > >>> duration when switching to kthreads. This is acceptable -- your > >>> 30x increase clearly is completely unacceptable and must be fixed. > >>> Other than that, the main thing that affects grace period duration is > >>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > >>> grace-period duration. > >> > >> for my 1024g system when memory hotadd is enabled in kernel config: > >> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > >> 2. current linus tree + tip tree + your tree - Peterz patch: > >> a. on fedora 14 gcc: will cost about 4s: like old times > >> b. on opensuse 11.3 gcc: will cost about 10s. > > > > So some patch in my tree that is not yet in tip makes things better? > > > > If so, could you please see which one? Maybe that would give me a hint > > that could make things better on opensuse 11.3 as well. > > today's tip: > > [ 31.795597] cpu_dev_init done > [ 40.930202] memory_dev_init done One other question... What is memory_dev_init() doing to wait for so many RCU grace periods? (Yes, I do need to fix the slowdowns in any case, but I am curious.) > after > > commit e219b351fc90c0f5304e16efbc603b3b78843ea1 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon May 16 02:44:06 2011 -0700 > > rcu: Remove old memory barriers from rcu_process_callbacks() > > Second step of partitioning of commit e59fb3120b. > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index 3731141..011bf6f 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -1460,25 +1460,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) > */ > static void rcu_process_callbacks(void) > { > - /* > - * Memory references from any prior RCU read-side critical sections > - * executed by the interrupted code must be seen before any RCU > - * grace-period manipulations below. > - */ > - smp_mb(); /* See above block comment. */ > - > __rcu_process_callbacks(&rcu_sched_state, > &__get_cpu_var(rcu_sched_data)); > __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); > rcu_preempt_process_callbacks(); > > - /* > - * Memory references from any later RCU read-side critical sections > - * executed by the interrupted code must be seen after any RCU > - * grace-period manipulations above. > - */ > - smp_mb(); /* See above block comment. */ > - > /* If we are last CPU on way to dyntick-idle mode, accelerate it. */ > rcu_needs_cpu_flush(); > } > > cause > > [ 32.235103] cpu_dev_init done > [ 74.897943] memory_dev_init done > > then add > > commit d0d642680d4cf5cc2ccf542b74a3c8b7e197306b > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon May 16 02:52:04 2011 -0700 > > rcu: Don't do reschedule unless in irq > > Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This > should be a no-op, because rcu_irq_exit() should only be called from irq. > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index 011bf6f..195b3a3 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -421,8 +421,9 @@ void rcu_irq_exit(void) > WARN_ON_ONCE(rdtp->dynticks & 0x1); > > /* If the interrupt queued a callback, get out of dyntick mode. */ > - if (__this_cpu_read(rcu_sched_data.nxtlist) || > - __this_cpu_read(rcu_bh_data.nxtlist)) > + if (in_irq() && > + (__this_cpu_read(rcu_sched_data.nxtlist) || > + __this_cpu_read(rcu_bh_data.nxtlist))) > set_need_resched(); > } > > got: > > [ 34.384490] cpu_dev_init done > [ 86.656322] memory_dev_init done > > > after > > commit fcfc28801f5b3b9c70616fc57e3a2c6f52014e14 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Mon May 16 14:27:31 2011 -0700 > > rcu: Make rcu_enter_nohz() pay attention to nesting > > The old version of rcu_enter_nohz() forced RCU into nohz mode even if > the nesting count was non-zero. This change causes rcu_enter_nohz() > to hold off for non-zero nesting counts. > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index 195b3a3..99c6038 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -324,8 +324,8 @@ void rcu_enter_nohz(void) > smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ > local_irq_save(flags); > rdtp = &__get_cpu_var(rcu_dynticks); > - rdtp->dynticks++; > - rdtp->dynticks_nesting--; > + if (--rdtp->dynticks_nesting == 0) > + rdtp->dynticks++; > WARN_ON_ONCE(rdtp->dynticks & 0x1); > local_irq_restore(flags); > } > > got: > > [ 32.414049] cpu_dev_init done > [ 38.237979] memory_dev_init done So this is best for you -- where we have done all but the last commit of restoring "Decrease memory-barrier usage based on semi-formal proof". It makes sense that this one would help, as it is eliminating delays due to misnesting. These delays are not hangs, as force_quiescent_state() will eventually force the right thing to happen, but getting rid of these delays should indeed speed things up. > after: > commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Tue Sep 7 10:38:22 2010 -0700 > > rcu: Decrease memory-barrier usage based on semi-formal proof > ... > > got: > > [ 32.447936] cpu_dev_init done > [ 111.027066] memory_dev_init done So there is something nasty in this patch. Not seeing it immediately, but it does give me some focus for both code inspection and possible diagnostic patches. > after > > commit fbb753fb9dd62318d27fa070c686423ced139817 > Author: Paul E. McKenney <paul.mckenney@linaro.org> > Date: Wed May 11 05:33:33 2011 -0700 > > atomic: Add atomic_or() > > An atomic_or() function is needed by TREE_RCU to avoid deadlock, so > add a generic version. > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/include/linux/atomic.h b/include/linux/atomic.h > index 96c038e..ee456c7 100644 > --- a/include/linux/atomic.h > +++ b/include/linux/atomic.h > @@ -34,4 +34,17 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint) > } > #endif > > +#ifndef CONFIG_ARCH_HAS_ATOMIC_OR > +static inline void atomic_or(int i, atomic_t *v) > +{ > + int old; > + int new; > + > + do { > + old = atomic_read(v); > + new = old | i; > + } while (atomic_cmpxchg(v, old, new) != old); > +} > +#endif /* #ifndef CONFIG_ARCH_HAS_ATOMIC_OR */ > + > #endif /* _LINUX_ATOMIC_H */ > > got: > > [ 32.803704] cpu_dev_init done > [ 99.171292] memory_dev_init done So the difference between these two is noise, I hope. Adding a static inline function that is not used should not have an effect on performance. Still, the difference between 6 seconds and 60 seconds rises far above this noise level, so the big differences are likely quite real. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 0:05 ` Paul E. McKenney @ 2011-05-25 0:13 ` Yinghai Lu 2011-05-25 4:46 ` Paul E. McKenney 2011-05-25 7:18 ` Ingo Molnar 2011-05-25 0:16 ` Paul E. McKenney 1 sibling, 2 replies; 45+ messages in thread From: Yinghai Lu @ 2011-05-25 0:13 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/24/2011 05:05 PM, Paul E. McKenney wrote: > On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: >> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: >>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: >>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: >>>> >>>>> OK, so it looks like I need to get this out of the way in order to track >>>>> down the delays. Or does reverting PeterZ's patch get you a stable >>>>> system, but with the longish delays in memory_dev_init()? If the latter, >>>>> it might be more productive to handle the two problems separately. >>>>> >>>>> For whatever it is worth, I do see about 5% increase in grace-period >>>>> duration when switching to kthreads. This is acceptable -- your >>>>> 30x increase clearly is completely unacceptable and must be fixed. >>>>> Other than that, the main thing that affects grace period duration is >>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the >>>>> grace-period duration. >>>> >>>> for my 1024g system when memory hotadd is enabled in kernel config: >>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. >>>> 2. current linus tree + tip tree + your tree - Peterz patch: >>>> a. on fedora 14 gcc: will cost about 4s: like old times >>>> b. on opensuse 11.3 gcc: will cost about 10s. >>> >>> So some patch in my tree that is not yet in tip makes things better? >>> >>> If so, could you please see which one? Maybe that would give me a hint >>> that could make things better on opensuse 11.3 as well. >> >> today's tip: >> >> [ 31.795597] cpu_dev_init done >> [ 40.930202] memory_dev_init done > > One other question... What is memory_dev_init() doing to wait for so > many RCU grace periods? (Yes, I do need to fix the slowdowns in any > case, but I am curious.) looks like it register some in sysfs /* * Initialize the sysfs support for memory devices... */ int __init memory_dev_init(void) { unsigned int i; int ret; int err; unsigned long block_sz; memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops; ret = sysdev_class_register(&memory_sysdev_class); if (ret) goto out; block_sz = get_memory_block_size(); sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; /* * Create entries for memory sections that were found * during boot and have been initialized */ for (i = 0; i < NR_MEM_SECTIONS; i++) { if (!present_section_nr(i)) continue; err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE, BOOT); if (!ret) ret = err; } err = memory_probe_init(); if (!ret) ret = err; err = memory_fail_init(); if (!ret) ret = err; err = block_size_init(); if (!ret) ret = err; out: if (ret) printk(KERN_ERR "%s() failed: %d\n", __func__, ret); return ret; } > >> after >> >> commit e219b351fc90c0f5304e16efbc603b3b78843ea1 >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> Date: Mon May 16 02:44:06 2011 -0700 >> >> rcu: Remove old memory barriers from rcu_process_callbacks() >> >> Second step of partitioning of commit e59fb3120b. >> >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c >> index 3731141..011bf6f 100644 >> --- a/kernel/rcutree.c >> +++ b/kernel/rcutree.c >> @@ -1460,25 +1460,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) >> */ >> static void rcu_process_callbacks(void) >> { >> - /* >> - * Memory references from any prior RCU read-side critical sections >> - * executed by the interrupted code must be seen before any RCU >> - * grace-period manipulations below. >> - */ >> - smp_mb(); /* See above block comment. */ >> - >> __rcu_process_callbacks(&rcu_sched_state, >> &__get_cpu_var(rcu_sched_data)); >> __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); >> rcu_preempt_process_callbacks(); >> >> - /* >> - * Memory references from any later RCU read-side critical sections >> - * executed by the interrupted code must be seen after any RCU >> - * grace-period manipulations above. >> - */ >> - smp_mb(); /* See above block comment. */ >> - >> /* If we are last CPU on way to dyntick-idle mode, accelerate it. */ >> rcu_needs_cpu_flush(); >> } >> >> cause >> >> [ 32.235103] cpu_dev_init done >> [ 74.897943] memory_dev_init done >> >> then add >> >> commit d0d642680d4cf5cc2ccf542b74a3c8b7e197306b >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> Date: Mon May 16 02:52:04 2011 -0700 >> >> rcu: Don't do reschedule unless in irq >> >> Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This >> should be a no-op, because rcu_irq_exit() should only be called from irq. >> >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c >> index 011bf6f..195b3a3 100644 >> --- a/kernel/rcutree.c >> +++ b/kernel/rcutree.c >> @@ -421,8 +421,9 @@ void rcu_irq_exit(void) >> WARN_ON_ONCE(rdtp->dynticks & 0x1); >> >> /* If the interrupt queued a callback, get out of dyntick mode. */ >> - if (__this_cpu_read(rcu_sched_data.nxtlist) || >> - __this_cpu_read(rcu_bh_data.nxtlist)) >> + if (in_irq() && >> + (__this_cpu_read(rcu_sched_data.nxtlist) || >> + __this_cpu_read(rcu_bh_data.nxtlist))) >> set_need_resched(); >> } >> >> got: >> >> [ 34.384490] cpu_dev_init done >> [ 86.656322] memory_dev_init done >> >> >> after >> >> commit fcfc28801f5b3b9c70616fc57e3a2c6f52014e14 >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> Date: Mon May 16 14:27:31 2011 -0700 >> >> rcu: Make rcu_enter_nohz() pay attention to nesting >> >> The old version of rcu_enter_nohz() forced RCU into nohz mode even if >> the nesting count was non-zero. This change causes rcu_enter_nohz() >> to hold off for non-zero nesting counts. >> >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c >> index 195b3a3..99c6038 100644 >> --- a/kernel/rcutree.c >> +++ b/kernel/rcutree.c >> @@ -324,8 +324,8 @@ void rcu_enter_nohz(void) >> smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ >> local_irq_save(flags); >> rdtp = &__get_cpu_var(rcu_dynticks); >> - rdtp->dynticks++; >> - rdtp->dynticks_nesting--; >> + if (--rdtp->dynticks_nesting == 0) >> + rdtp->dynticks++; >> WARN_ON_ONCE(rdtp->dynticks & 0x1); >> local_irq_restore(flags); >> } >> >> got: >> >> [ 32.414049] cpu_dev_init done >> [ 38.237979] memory_dev_init done > > So this is best for you -- where we have done all but the last commit > of restoring "Decrease memory-barrier usage based on semi-formal proof". > It makes sense that this one would help, as it is eliminating delays > due to misnesting. These delays are not hangs, as force_quiescent_state() > will eventually force the right thing to happen, but getting rid of these > delays should indeed speed things up. > >> after: >> commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> Date: Tue Sep 7 10:38:22 2010 -0700 >> >> rcu: Decrease memory-barrier usage based on semi-formal proof >> ... >> >> got: >> >> [ 32.447936] cpu_dev_init done >> [ 111.027066] memory_dev_init done > > So there is something nasty in this patch. > > Not seeing it immediately, but it does give me some focus for both > code inspection and possible diagnostic patches. > >> after >> >> commit fbb753fb9dd62318d27fa070c686423ced139817 >> Author: Paul E. McKenney <paul.mckenney@linaro.org> >> Date: Wed May 11 05:33:33 2011 -0700 >> >> atomic: Add atomic_or() >> >> An atomic_or() function is needed by TREE_RCU to avoid deadlock, so >> add a generic version. >> >> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >> >> diff --git a/include/linux/atomic.h b/include/linux/atomic.h >> index 96c038e..ee456c7 100644 >> --- a/include/linux/atomic.h >> +++ b/include/linux/atomic.h >> @@ -34,4 +34,17 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint) >> } >> #endif >> >> +#ifndef CONFIG_ARCH_HAS_ATOMIC_OR >> +static inline void atomic_or(int i, atomic_t *v) >> +{ >> + int old; >> + int new; >> + >> + do { >> + old = atomic_read(v); >> + new = old | i; >> + } while (atomic_cmpxchg(v, old, new) != old); >> +} >> +#endif /* #ifndef CONFIG_ARCH_HAS_ATOMIC_OR */ >> + >> #endif /* _LINUX_ATOMIC_H */ >> >> got: >> >> [ 32.803704] cpu_dev_init done >> [ 99.171292] memory_dev_init done > > So the difference between these two is noise, I hope. Adding a static > inline function that is not used should not have an effect on performance. > Still, the difference between 6 seconds and 60 seconds rises far above > this noise level, so the big differences are likely quite real. could be softirq to kthread change... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 0:13 ` Yinghai Lu @ 2011-05-25 4:46 ` Paul E. McKenney 2011-05-25 7:24 ` Ingo Molnar 2011-05-25 7:18 ` Ingo Molnar 1 sibling, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 4:46 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Tue, May 24, 2011 at 05:13:06PM -0700, Yinghai Lu wrote: > On 05/24/2011 05:05 PM, Paul E. McKenney wrote: > > On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: > >> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > >>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > >>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > >>>> > >>>>> OK, so it looks like I need to get this out of the way in order to track > >>>>> down the delays. Or does reverting PeterZ's patch get you a stable > >>>>> system, but with the longish delays in memory_dev_init()? If the latter, > >>>>> it might be more productive to handle the two problems separately. > >>>>> > >>>>> For whatever it is worth, I do see about 5% increase in grace-period > >>>>> duration when switching to kthreads. This is acceptable -- your > >>>>> 30x increase clearly is completely unacceptable and must be fixed. > >>>>> Other than that, the main thing that affects grace period duration is > >>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > >>>>> grace-period duration. > >>>> > >>>> for my 1024g system when memory hotadd is enabled in kernel config: > >>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > >>>> 2. current linus tree + tip tree + your tree - Peterz patch: > >>>> a. on fedora 14 gcc: will cost about 4s: like old times > >>>> b. on opensuse 11.3 gcc: will cost about 10s. > >>> > >>> So some patch in my tree that is not yet in tip makes things better? > >>> > >>> If so, could you please see which one? Maybe that would give me a hint > >>> that could make things better on opensuse 11.3 as well. > >> > >> today's tip: > >> > >> [ 31.795597] cpu_dev_init done > >> [ 40.930202] memory_dev_init done > > > > One other question... What is memory_dev_init() doing to wait for so > > many RCU grace periods? (Yes, I do need to fix the slowdowns in any > > case, but I am curious.) > > looks like it register some in sysfs Use of synchronize_rcu() for unregistering would make sense, but I don't understand why it is needed when registering. Thanx, Paul > /* > * Initialize the sysfs support for memory devices... > */ > int __init memory_dev_init(void) > { > unsigned int i; > int ret; > int err; > unsigned long block_sz; > > memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops; > ret = sysdev_class_register(&memory_sysdev_class); > if (ret) > goto out; > > block_sz = get_memory_block_size(); > sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; > > /* > * Create entries for memory sections that were found > * during boot and have been initialized > */ > for (i = 0; i < NR_MEM_SECTIONS; i++) { > if (!present_section_nr(i)) > continue; > err = add_memory_section(0, __nr_to_section(i), MEM_ONLINE, > BOOT); > if (!ret) > ret = err; > } > > err = memory_probe_init(); > if (!ret) > ret = err; > err = memory_fail_init(); > if (!ret) > ret = err; > err = block_size_init(); > if (!ret) > ret = err; > out: > if (ret) > printk(KERN_ERR "%s() failed: %d\n", __func__, ret); > return ret; > } > > > > > >> after > >> > >> commit e219b351fc90c0f5304e16efbc603b3b78843ea1 > >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> Date: Mon May 16 02:44:06 2011 -0700 > >> > >> rcu: Remove old memory barriers from rcu_process_callbacks() > >> > >> Second step of partitioning of commit e59fb3120b. > >> > >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> > >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > >> index 3731141..011bf6f 100644 > >> --- a/kernel/rcutree.c > >> +++ b/kernel/rcutree.c > >> @@ -1460,25 +1460,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp) > >> */ > >> static void rcu_process_callbacks(void) > >> { > >> - /* > >> - * Memory references from any prior RCU read-side critical sections > >> - * executed by the interrupted code must be seen before any RCU > >> - * grace-period manipulations below. > >> - */ > >> - smp_mb(); /* See above block comment. */ > >> - > >> __rcu_process_callbacks(&rcu_sched_state, > >> &__get_cpu_var(rcu_sched_data)); > >> __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data)); > >> rcu_preempt_process_callbacks(); > >> > >> - /* > >> - * Memory references from any later RCU read-side critical sections > >> - * executed by the interrupted code must be seen after any RCU > >> - * grace-period manipulations above. > >> - */ > >> - smp_mb(); /* See above block comment. */ > >> - > >> /* If we are last CPU on way to dyntick-idle mode, accelerate it. */ > >> rcu_needs_cpu_flush(); > >> } > >> > >> cause > >> > >> [ 32.235103] cpu_dev_init done > >> [ 74.897943] memory_dev_init done > >> > >> then add > >> > >> commit d0d642680d4cf5cc2ccf542b74a3c8b7e197306b > >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> Date: Mon May 16 02:52:04 2011 -0700 > >> > >> rcu: Don't do reschedule unless in irq > >> > >> Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This > >> should be a no-op, because rcu_irq_exit() should only be called from irq. > >> > >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> > >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > >> index 011bf6f..195b3a3 100644 > >> --- a/kernel/rcutree.c > >> +++ b/kernel/rcutree.c > >> @@ -421,8 +421,9 @@ void rcu_irq_exit(void) > >> WARN_ON_ONCE(rdtp->dynticks & 0x1); > >> > >> /* If the interrupt queued a callback, get out of dyntick mode. */ > >> - if (__this_cpu_read(rcu_sched_data.nxtlist) || > >> - __this_cpu_read(rcu_bh_data.nxtlist)) > >> + if (in_irq() && > >> + (__this_cpu_read(rcu_sched_data.nxtlist) || > >> + __this_cpu_read(rcu_bh_data.nxtlist))) > >> set_need_resched(); > >> } > >> > >> got: > >> > >> [ 34.384490] cpu_dev_init done > >> [ 86.656322] memory_dev_init done > >> > >> > >> after > >> > >> commit fcfc28801f5b3b9c70616fc57e3a2c6f52014e14 > >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> Date: Mon May 16 14:27:31 2011 -0700 > >> > >> rcu: Make rcu_enter_nohz() pay attention to nesting > >> > >> The old version of rcu_enter_nohz() forced RCU into nohz mode even if > >> the nesting count was non-zero. This change causes rcu_enter_nohz() > >> to hold off for non-zero nesting counts. > >> > >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> > >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > >> index 195b3a3..99c6038 100644 > >> --- a/kernel/rcutree.c > >> +++ b/kernel/rcutree.c > >> @@ -324,8 +324,8 @@ void rcu_enter_nohz(void) > >> smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ > >> local_irq_save(flags); > >> rdtp = &__get_cpu_var(rcu_dynticks); > >> - rdtp->dynticks++; > >> - rdtp->dynticks_nesting--; > >> + if (--rdtp->dynticks_nesting == 0) > >> + rdtp->dynticks++; > >> WARN_ON_ONCE(rdtp->dynticks & 0x1); > >> local_irq_restore(flags); > >> } > >> > >> got: > >> > >> [ 32.414049] cpu_dev_init done > >> [ 38.237979] memory_dev_init done > > > > So this is best for you -- where we have done all but the last commit > > of restoring "Decrease memory-barrier usage based on semi-formal proof". > > It makes sense that this one would help, as it is eliminating delays > > due to misnesting. These delays are not hangs, as force_quiescent_state() > > will eventually force the right thing to happen, but getting rid of these > > delays should indeed speed things up. > > > >> after: > >> commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c > >> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> Date: Tue Sep 7 10:38:22 2010 -0700 > >> > >> rcu: Decrease memory-barrier usage based on semi-formal proof > >> ... > >> > >> got: > >> > >> [ 32.447936] cpu_dev_init done > >> [ 111.027066] memory_dev_init done > > > > So there is something nasty in this patch. > > > > Not seeing it immediately, but it does give me some focus for both > > code inspection and possible diagnostic patches. > > > >> after > >> > >> commit fbb753fb9dd62318d27fa070c686423ced139817 > >> Author: Paul E. McKenney <paul.mckenney@linaro.org> > >> Date: Wed May 11 05:33:33 2011 -0700 > >> > >> atomic: Add atomic_or() > >> > >> An atomic_or() function is needed by TREE_RCU to avoid deadlock, so > >> add a generic version. > >> > >> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > >> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >> > >> diff --git a/include/linux/atomic.h b/include/linux/atomic.h > >> index 96c038e..ee456c7 100644 > >> --- a/include/linux/atomic.h > >> +++ b/include/linux/atomic.h > >> @@ -34,4 +34,17 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint) > >> } > >> #endif > >> > >> +#ifndef CONFIG_ARCH_HAS_ATOMIC_OR > >> +static inline void atomic_or(int i, atomic_t *v) > >> +{ > >> + int old; > >> + int new; > >> + > >> + do { > >> + old = atomic_read(v); > >> + new = old | i; > >> + } while (atomic_cmpxchg(v, old, new) != old); > >> +} > >> +#endif /* #ifndef CONFIG_ARCH_HAS_ATOMIC_OR */ > >> + > >> #endif /* _LINUX_ATOMIC_H */ > >> > >> got: > >> > >> [ 32.803704] cpu_dev_init done > >> [ 99.171292] memory_dev_init done > > > > So the difference between these two is noise, I hope. Adding a static > > inline function that is not used should not have an effect on performance. > > Still, the difference between 6 seconds and 60 seconds rises far above > > this noise level, so the big differences are likely quite real. > > could be softirq to kthread change... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 4:46 ` Paul E. McKenney @ 2011-05-25 7:24 ` Ingo Molnar 2011-05-25 20:48 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Ingo Molnar @ 2011-05-25 7:24 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, May 24, 2011 at 05:13:06PM -0700, Yinghai Lu wrote: > > On 05/24/2011 05:05 PM, Paul E. McKenney wrote: > > > On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: > > >> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > > >>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > > >>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > > >>>> > > >>>>> OK, so it looks like I need to get this out of the way in order to track > > >>>>> down the delays. Or does reverting PeterZ's patch get you a stable > > >>>>> system, but with the longish delays in memory_dev_init()? If the latter, > > >>>>> it might be more productive to handle the two problems separately. > > >>>>> > > >>>>> For whatever it is worth, I do see about 5% increase in grace-period > > >>>>> duration when switching to kthreads. This is acceptable -- your > > >>>>> 30x increase clearly is completely unacceptable and must be fixed. > > >>>>> Other than that, the main thing that affects grace period duration is > > >>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > > >>>>> grace-period duration. > > >>>> > > >>>> for my 1024g system when memory hotadd is enabled in kernel config: > > >>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > > >>>> 2. current linus tree + tip tree + your tree - Peterz patch: > > >>>> a. on fedora 14 gcc: will cost about 4s: like old times > > >>>> b. on opensuse 11.3 gcc: will cost about 10s. > > >>> > > >>> So some patch in my tree that is not yet in tip makes things better? > > >>> > > >>> If so, could you please see which one? Maybe that would give me a hint > > >>> that could make things better on opensuse 11.3 as well. > > >> > > >> today's tip: > > >> > > >> [ 31.795597] cpu_dev_init done > > >> [ 40.930202] memory_dev_init done > > > > > > One other question... What is memory_dev_init() doing to wait for so > > > many RCU grace periods? (Yes, I do need to fix the slowdowns in any > > > case, but I am curious.) > > > > looks like it register some in sysfs > > Use of synchronize_rcu() for unregistering would make sense, but > I don't understand why it is needed when registering. I guess writing a patch to remove it would be welcome by the sysfs folks - or some subtle reason would be pointed out (which reason could thus be added to the code in a comment). Understanding the nondeterminism of grace periods would be extremely nice though, there *are* workloads that use rcu syncs rather frequently, and we have probably regressed them. Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 7:24 ` Ingo Molnar @ 2011-05-25 20:48 ` Paul E. McKenney 0 siblings, 0 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 20:48 UTC (permalink / raw) To: Ingo Molnar; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx On Wed, May 25, 2011 at 09:24:06AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > On Tue, May 24, 2011 at 05:13:06PM -0700, Yinghai Lu wrote: > > > On 05/24/2011 05:05 PM, Paul E. McKenney wrote: > > > > On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: > > > >> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > > > >>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > > > >>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > > > >>>> > > > >>>>> OK, so it looks like I need to get this out of the way in order to track > > > >>>>> down the delays. Or does reverting PeterZ's patch get you a stable > > > >>>>> system, but with the longish delays in memory_dev_init()? If the latter, > > > >>>>> it might be more productive to handle the two problems separately. > > > >>>>> > > > >>>>> For whatever it is worth, I do see about 5% increase in grace-period > > > >>>>> duration when switching to kthreads. This is acceptable -- your > > > >>>>> 30x increase clearly is completely unacceptable and must be fixed. > > > >>>>> Other than that, the main thing that affects grace period duration is > > > >>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > > > >>>>> grace-period duration. > > > >>>> > > > >>>> for my 1024g system when memory hotadd is enabled in kernel config: > > > >>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > > > >>>> 2. current linus tree + tip tree + your tree - Peterz patch: > > > >>>> a. on fedora 14 gcc: will cost about 4s: like old times > > > >>>> b. on opensuse 11.3 gcc: will cost about 10s. > > > >>> > > > >>> So some patch in my tree that is not yet in tip makes things better? > > > >>> > > > >>> If so, could you please see which one? Maybe that would give me a hint > > > >>> that could make things better on opensuse 11.3 as well. > > > >> > > > >> today's tip: > > > >> > > > >> [ 31.795597] cpu_dev_init done > > > >> [ 40.930202] memory_dev_init done > > > > > > > > One other question... What is memory_dev_init() doing to wait for so > > > > many RCU grace periods? (Yes, I do need to fix the slowdowns in any > > > > case, but I am curious.) > > > > > > looks like it register some in sysfs > > > > Use of synchronize_rcu() for unregistering would make sense, but > > I don't understand why it is needed when registering. > > I guess writing a patch to remove it would be welcome by the sysfs folks - or > some subtle reason would be pointed out (which reason could thus be added to > the code in a comment). > > Understanding the nondeterminism of grace periods would be extremely nice > though, there *are* workloads that use rcu syncs rather frequently, and we have > probably regressed them. Agreed, if I can help people speed up sysfs creation, that would be good, but avoiding/fixing RCU grace-period performance regressions is also a good thing. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 0:13 ` Yinghai Lu 2011-05-25 4:46 ` Paul E. McKenney @ 2011-05-25 7:18 ` Ingo Molnar 1 sibling, 0 replies; 45+ messages in thread From: Ingo Molnar @ 2011-05-25 7:18 UTC (permalink / raw) To: Yinghai Lu; +Cc: paulmck, linux-kernel, mingo, hpa, tglx * Yinghai Lu <yinghai@kernel.org> wrote: > >> got: > >> > >> [ 32.803704] cpu_dev_init done > >> [ 99.171292] memory_dev_init done > > > > So the difference between these two is noise, I hope. Adding a static > > inline function that is not used should not have an effect on performance. > > Still, the difference between 6 seconds and 60 seconds rises far above > > this noise level, so the big differences are likely quite real. > > could be softirq to kthread change... Softirq processing can be pretty nondeterministic (for example there's no guarantee that we process all softirqs) - but kthreads ought to be pretty deterministic. Weird. Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 0:05 ` Paul E. McKenney 2011-05-25 0:13 ` Yinghai Lu @ 2011-05-25 0:16 ` Paul E. McKenney 1 sibling, 0 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 0:16 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Tue, May 24, 2011 at 05:05:30PM -0700, Paul E. McKenney wrote: > On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote: > > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: [ . . . ] > > after: > > commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Tue Sep 7 10:38:22 2010 -0700 > > > > rcu: Decrease memory-barrier usage based on semi-formal proof > > ... > > > > got: > > > > [ 32.447936] cpu_dev_init done > > [ 111.027066] memory_dev_init done > > So there is something nasty in this patch. > > Not seeing it immediately, but it does give me some focus for both > code inspection and possible diagnostic patches. Actually, I already do have some debugfs stuff that should help me spot the problem. So could you please build both with and without this commit enabling CONFIG_TRACE_RCU and send me the contents of the debugfs files rcu/rcuhier and rcu/rcudata in both cases? This will show me the results of the full boot path. If this turns out to drown out the differences, I will create a more focused diagnostic patch. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-24 21:23 ` Yinghai Lu 2011-05-25 0:05 ` Paul E. McKenney @ 2011-05-25 0:10 ` Yinghai Lu 2011-05-25 4:52 ` Paul E. McKenney 1 sibling, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-25 0:10 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/24/2011 02:23 PM, Yinghai Lu wrote: > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: >> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: >>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: >>> >>>> OK, so it looks like I need to get this out of the way in order to track >>>> down the delays. Or does reverting PeterZ's patch get you a stable >>>> system, but with the longish delays in memory_dev_init()? If the latter, >>>> it might be more productive to handle the two problems separately. >>>> >>>> For whatever it is worth, I do see about 5% increase in grace-period >>>> duration when switching to kthreads. This is acceptable -- your >>>> 30x increase clearly is completely unacceptable and must be fixed. >>>> Other than that, the main thing that affects grace period duration is >>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the >>>> grace-period duration. >>> >>> for my 1024g system when memory hotadd is enabled in kernel config: >>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. >>> 2. current linus tree + tip tree + your tree - Peterz patch: >>> a. on fedora 14 gcc: will cost about 4s: like old times >>> b. on opensuse 11.3 gcc: will cost about 10s. >> >> So some patch in my tree that is not yet in tip makes things better? >> >> If so, could you please see which one? Maybe that would give me a hint >> that could make things better on opensuse 11.3 as well. > > today's tip: > > [ 31.795597] cpu_dev_init done > [ 40.930202] memory_dev_init done > another boot from tip got: [ 35.211927] cpu_dev_init done [ 136.053698] memory_dev_init done wonder if you can have clean revert for commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > Author: Paul E. McKenney <paul.mckenney@linaro.org> > Date: Wed Jan 12 14:10:23 2011 -0800 > > rcu: move TREE_RCU from softirq to kthread > > If RCU priority boosting is to be meaningful, callback invocation must > be boosted in addition to preempted RCU readers. Otherwise, in presence > of CPU real-time threads, the grace period ends, but the callbacks don't > get invoked. If the callbacks don't get invoked, the associated memory > doesn't get freed, so the system is still subject to OOM. > > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit > moves the callback invocations to a kthread, which can be boosted easily. > > Also add comments and properly synchronized all accesses to > rcu_cpu_kthread_task, as suggested by Lai Jiangshan. > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Reviewed-by: Josh Triplett <josh@joshtriplett.org> Thanks Yinghai Lu ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 0:10 ` Yinghai Lu @ 2011-05-25 4:52 ` Paul E. McKenney 2011-05-25 7:27 ` Ingo Molnar 2011-05-25 22:15 ` Yinghai Lu 0 siblings, 2 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 4:52 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Tue, May 24, 2011 at 05:10:11PM -0700, Yinghai Lu wrote: > On 05/24/2011 02:23 PM, Yinghai Lu wrote: > > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > >> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > >>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > >>> > >>>> OK, so it looks like I need to get this out of the way in order to track > >>>> down the delays. Or does reverting PeterZ's patch get you a stable > >>>> system, but with the longish delays in memory_dev_init()? If the latter, > >>>> it might be more productive to handle the two problems separately. > >>>> > >>>> For whatever it is worth, I do see about 5% increase in grace-period > >>>> duration when switching to kthreads. This is acceptable -- your > >>>> 30x increase clearly is completely unacceptable and must be fixed. > >>>> Other than that, the main thing that affects grace period duration is > >>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > >>>> grace-period duration. > >>> > >>> for my 1024g system when memory hotadd is enabled in kernel config: > >>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > >>> 2. current linus tree + tip tree + your tree - Peterz patch: > >>> a. on fedora 14 gcc: will cost about 4s: like old times > >>> b. on opensuse 11.3 gcc: will cost about 10s. > >> > >> So some patch in my tree that is not yet in tip makes things better? > >> > >> If so, could you please see which one? Maybe that would give me a hint > >> that could make things better on opensuse 11.3 as well. > > > > today's tip: > > > > [ 31.795597] cpu_dev_init done > > [ 40.930202] memory_dev_init done > > > > another boot from tip got: > > [ 35.211927] cpu_dev_init done > [ 136.053698] memory_dev_init done > > wonder if you can have clean revert for > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > rcu: move TREE_RCU from softirq to kthread > > > > If RCU priority boosting is to be meaningful, callback invocation must > > be boosted in addition to preempted RCU readers. Otherwise, in presence > > of CPU real-time threads, the grace period ends, but the callbacks don't > > get invoked. If the callbacks don't get invoked, the associated memory > > doesn't get freed, so the system is still subject to OOM. > > > > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit > > moves the callback invocations to a kthread, which can be boosted easily. > > > > Also add comments and properly synchronized all accesses to > > rcu_cpu_kthread_task, as suggested by Lai Jiangshan. > > > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Reviewed-by: Josh Triplett <josh@joshtriplett.org> There is a new branch yinghai.2011.05.24a on: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git Or will be as soon as kernel.org updates its mirrors. I am not sure I could call this "clean", but it does revert that commit and 11 of the subsequent commits that depend on it. It does build, and I will test it once my currently running tests complete. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 4:52 ` Paul E. McKenney @ 2011-05-25 7:27 ` Ingo Molnar 2011-05-25 20:47 ` Paul E. McKenney 2011-05-25 22:15 ` Yinghai Lu 1 sibling, 1 reply; 45+ messages in thread From: Ingo Molnar @ 2011-05-25 7:27 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, May 24, 2011 at 05:10:11PM -0700, Yinghai Lu wrote: > > On 05/24/2011 02:23 PM, Yinghai Lu wrote: > > > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > > >> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > > >>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > > >>> > > >>>> OK, so it looks like I need to get this out of the way in order to track > > >>>> down the delays. Or does reverting PeterZ's patch get you a stable > > >>>> system, but with the longish delays in memory_dev_init()? If the latter, > > >>>> it might be more productive to handle the two problems separately. > > >>>> > > >>>> For whatever it is worth, I do see about 5% increase in grace-period > > >>>> duration when switching to kthreads. This is acceptable -- your > > >>>> 30x increase clearly is completely unacceptable and must be fixed. > > >>>> Other than that, the main thing that affects grace period duration is > > >>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > > >>>> grace-period duration. > > >>> > > >>> for my 1024g system when memory hotadd is enabled in kernel config: > > >>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > > >>> 2. current linus tree + tip tree + your tree - Peterz patch: > > >>> a. on fedora 14 gcc: will cost about 4s: like old times > > >>> b. on opensuse 11.3 gcc: will cost about 10s. > > >> > > >> So some patch in my tree that is not yet in tip makes things better? > > >> > > >> If so, could you please see which one? Maybe that would give me a hint > > >> that could make things better on opensuse 11.3 as well. > > > > > > today's tip: > > > > > > [ 31.795597] cpu_dev_init done > > > [ 40.930202] memory_dev_init done > > > > > > > another boot from tip got: > > > > [ 35.211927] cpu_dev_init done > > [ 136.053698] memory_dev_init done > > > > wonder if you can have clean revert for > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > If RCU priority boosting is to be meaningful, callback invocation must > > > be boosted in addition to preempted RCU readers. Otherwise, in presence > > > of CPU real-time threads, the grace period ends, but the callbacks don't > > > get invoked. If the callbacks don't get invoked, the associated memory > > > doesn't get freed, so the system is still subject to OOM. > > > > > > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit > > > moves the callback invocations to a kthread, which can be boosted easily. > > > > > > Also add comments and properly synchronized all accesses to > > > rcu_cpu_kthread_task, as suggested by Lai Jiangshan. > > > > > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > Reviewed-by: Josh Triplett <josh@joshtriplett.org> > > There is a new branch yinghai.2011.05.24a on: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > Or will be as soon as kernel.org updates its mirrors. > > I am not sure I could call this "clean", but it does revert that commit > and 11 of the subsequent commits that depend on it. It does build, > and I will test it once my currently running tests complete. Given that this is about a 1-2 minute delays with 1 *terabyte* of RAM, the per gigabyte delay is like 60-120 msecs, right? So it's not a regression we are absolutely forced to address via a quick revert, debugging it would be nicer. There's something we don't understand and that's arguably worse than having unresolved non-fatal bugs :-) We already fixed the worst problem via a revert, the semi-hang: so i don't think there's pressure to do other reverts - other than for diagnostic purposes, of course. Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 7:27 ` Ingo Molnar @ 2011-05-25 20:47 ` Paul E. McKenney 2011-05-25 20:52 ` Ingo Molnar 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 20:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx On Wed, May 25, 2011 at 09:27:42AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > On Tue, May 24, 2011 at 05:10:11PM -0700, Yinghai Lu wrote: > > > On 05/24/2011 02:23 PM, Yinghai Lu wrote: > > > > On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > > > >> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > > > >>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > > > >>> > > > >>>> OK, so it looks like I need to get this out of the way in order to track > > > >>>> down the delays. Or does reverting PeterZ's patch get you a stable > > > >>>> system, but with the longish delays in memory_dev_init()? If the latter, > > > >>>> it might be more productive to handle the two problems separately. > > > >>>> > > > >>>> For whatever it is worth, I do see about 5% increase in grace-period > > > >>>> duration when switching to kthreads. This is acceptable -- your > > > >>>> 30x increase clearly is completely unacceptable and must be fixed. > > > >>>> Other than that, the main thing that affects grace period duration is > > > >>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > > > >>>> grace-period duration. > > > >>> > > > >>> for my 1024g system when memory hotadd is enabled in kernel config: > > > >>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > > > >>> 2. current linus tree + tip tree + your tree - Peterz patch: > > > >>> a. on fedora 14 gcc: will cost about 4s: like old times > > > >>> b. on opensuse 11.3 gcc: will cost about 10s. > > > >> > > > >> So some patch in my tree that is not yet in tip makes things better? > > > >> > > > >> If so, could you please see which one? Maybe that would give me a hint > > > >> that could make things better on opensuse 11.3 as well. > > > > > > > > today's tip: > > > > > > > > [ 31.795597] cpu_dev_init done > > > > [ 40.930202] memory_dev_init done > > > > > > > > > > another boot from tip got: > > > > > > [ 35.211927] cpu_dev_init done > > > [ 136.053698] memory_dev_init done > > > > > > wonder if you can have clean revert for > > > > > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org> > > > > Date: Wed Jan 12 14:10:23 2011 -0800 > > > > > > > > rcu: move TREE_RCU from softirq to kthread > > > > > > > > If RCU priority boosting is to be meaningful, callback invocation must > > > > be boosted in addition to preempted RCU readers. Otherwise, in presence > > > > of CPU real-time threads, the grace period ends, but the callbacks don't > > > > get invoked. If the callbacks don't get invoked, the associated memory > > > > doesn't get freed, so the system is still subject to OOM. > > > > > > > > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit > > > > moves the callback invocations to a kthread, which can be boosted easily. > > > > > > > > Also add comments and properly synchronized all accesses to > > > > rcu_cpu_kthread_task, as suggested by Lai Jiangshan. > > > > > > > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > Reviewed-by: Josh Triplett <josh@joshtriplett.org> > > > > There is a new branch yinghai.2011.05.24a on: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > > > Or will be as soon as kernel.org updates its mirrors. > > > > I am not sure I could call this "clean", but it does revert that commit > > and 11 of the subsequent commits that depend on it. It does build, > > and I will test it once my currently running tests complete. > > Given that this is about a 1-2 minute delays with 1 *terabyte* of RAM, the per > gigabyte delay is like 60-120 msecs, right? > > So it's not a regression we are absolutely forced to address via a quick > revert, debugging it would be nicer. There's something we don't understand and > that's arguably worse than having unresolved non-fatal bugs :-) And my attempted revert results in test failures in any case. :-( > We already fixed the worst problem via a revert, the semi-hang: so i don't > think there's pressure to do other reverts - other than for diagnostic > purposes, of course. Given that I have to debug in any case, I am happier debugging in the forward direction rather than in the backwards direction. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 20:47 ` Paul E. McKenney @ 2011-05-25 20:52 ` Ingo Molnar 0 siblings, 0 replies; 45+ messages in thread From: Ingo Molnar @ 2011-05-25 20:52 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > We already fixed the worst problem via a revert, the semi-hang: > > so i don't think there's pressure to do other reverts - other > > than for diagnostic purposes, of course. > > Given that I have to debug in any case, I am happier debugging in > the forward direction rather than in the backwards direction. ;-) hehe, fair enough :-) Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 4:52 ` Paul E. McKenney 2011-05-25 7:27 ` Ingo Molnar @ 2011-05-25 22:15 ` Yinghai Lu 2011-05-25 22:34 ` Paul E. McKenney 1 sibling, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-25 22:15 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/24/2011 09:52 PM, Paul E. McKenney wrote: > On Tue, May 24, 2011 at 05:10:11PM -0700, Yinghai Lu wrote: >> On 05/24/2011 02:23 PM, Yinghai Lu wrote: >>> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: >>>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: >>>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: >>>>> >>>>>> OK, so it looks like I need to get this out of the way in order to track >>>>>> down the delays. Or does reverting PeterZ's patch get you a stable >>>>>> system, but with the longish delays in memory_dev_init()? If the latter, >>>>>> it might be more productive to handle the two problems separately. >>>>>> >>>>>> For whatever it is worth, I do see about 5% increase in grace-period >>>>>> duration when switching to kthreads. This is acceptable -- your >>>>>> 30x increase clearly is completely unacceptable and must be fixed. >>>>>> Other than that, the main thing that affects grace period duration is >>>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the >>>>>> grace-period duration. >>>>> >>>>> for my 1024g system when memory hotadd is enabled in kernel config: >>>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. >>>>> 2. current linus tree + tip tree + your tree - Peterz patch: >>>>> a. on fedora 14 gcc: will cost about 4s: like old times >>>>> b. on opensuse 11.3 gcc: will cost about 10s. >>>> >>>> So some patch in my tree that is not yet in tip makes things better? >>>> >>>> If so, could you please see which one? Maybe that would give me a hint >>>> that could make things better on opensuse 11.3 as well. >>> >>> today's tip: >>> >>> [ 31.795597] cpu_dev_init done >>> [ 40.930202] memory_dev_init done >>> >> >> another boot from tip got: >> >> [ 35.211927] cpu_dev_init done >> [ 136.053698] memory_dev_init done >> >> wonder if you can have clean revert for >> >> commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 >>> Author: Paul E. McKenney <paul.mckenney@linaro.org> >>> Date: Wed Jan 12 14:10:23 2011 -0800 >>> >>> rcu: move TREE_RCU from softirq to kthread >>> >>> If RCU priority boosting is to be meaningful, callback invocation must >>> be boosted in addition to preempted RCU readers. Otherwise, in presence >>> of CPU real-time threads, the grace period ends, but the callbacks don't >>> get invoked. If the callbacks don't get invoked, the associated memory >>> doesn't get freed, so the system is still subject to OOM. >>> >>> But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit >>> moves the callback invocations to a kthread, which can be boosted easily. >>> >>> Also add comments and properly synchronized all accesses to >>> rcu_cpu_kthread_task, as suggested by Lai Jiangshan. >>> >>> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> >>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> >>> Reviewed-by: Josh Triplett <josh@joshtriplett.org> > > There is a new branch yinghai.2011.05.24a on: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > Or will be as soon as kernel.org updates its mirrors. > > I am not sure I could call this "clean", but it does revert that commit > and 11 of the subsequent commits that depend on it. It does build, > and I will test it once my currently running tests complete. yes, with those revert, there is no delay in 10 times booting. Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 22:15 ` Yinghai Lu @ 2011-05-25 22:34 ` Paul E. McKenney 2011-05-25 22:49 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-25 22:34 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: > On 05/24/2011 09:52 PM, Paul E. McKenney wrote: > > On Tue, May 24, 2011 at 05:10:11PM -0700, Yinghai Lu wrote: > >> On 05/24/2011 02:23 PM, Yinghai Lu wrote: > >>> On 05/23/2011 06:35 PM, Paul E. McKenney wrote: > >>>> On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote: > >>>>> On 05/23/2011 06:18 PM, Paul E. McKenney wrote: > >>>>> > >>>>>> OK, so it looks like I need to get this out of the way in order to track > >>>>>> down the delays. Or does reverting PeterZ's patch get you a stable > >>>>>> system, but with the longish delays in memory_dev_init()? If the latter, > >>>>>> it might be more productive to handle the two problems separately. > >>>>>> > >>>>>> For whatever it is worth, I do see about 5% increase in grace-period > >>>>>> duration when switching to kthreads. This is acceptable -- your > >>>>>> 30x increase clearly is completely unacceptable and must be fixed. > >>>>>> Other than that, the main thing that affects grace period duration is > >>>>>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the > >>>>>> grace-period duration. > >>>>> > >>>>> for my 1024g system when memory hotadd is enabled in kernel config: > >>>>> 1. current linus tree + tip tree: memory_dev_init will take about 100s. > >>>>> 2. current linus tree + tip tree + your tree - Peterz patch: > >>>>> a. on fedora 14 gcc: will cost about 4s: like old times > >>>>> b. on opensuse 11.3 gcc: will cost about 10s. > >>>> > >>>> So some patch in my tree that is not yet in tip makes things better? > >>>> > >>>> If so, could you please see which one? Maybe that would give me a hint > >>>> that could make things better on opensuse 11.3 as well. > >>> > >>> today's tip: > >>> > >>> [ 31.795597] cpu_dev_init done > >>> [ 40.930202] memory_dev_init done > >>> > >> > >> another boot from tip got: > >> > >> [ 35.211927] cpu_dev_init done > >> [ 136.053698] memory_dev_init done > >> > >> wonder if you can have clean revert for > >> > >> commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114 > >>> Author: Paul E. McKenney <paul.mckenney@linaro.org> > >>> Date: Wed Jan 12 14:10:23 2011 -0800 > >>> > >>> rcu: move TREE_RCU from softirq to kthread > >>> > >>> If RCU priority boosting is to be meaningful, callback invocation must > >>> be boosted in addition to preempted RCU readers. Otherwise, in presence > >>> of CPU real-time threads, the grace period ends, but the callbacks don't > >>> get invoked. If the callbacks don't get invoked, the associated memory > >>> doesn't get freed, so the system is still subject to OOM. > >>> > >>> But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit > >>> moves the callback invocations to a kthread, which can be boosted easily. > >>> > >>> Also add comments and properly synchronized all accesses to > >>> rcu_cpu_kthread_task, as suggested by Lai Jiangshan. > >>> > >>> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> > >>> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >>> Reviewed-by: Josh Triplett <josh@joshtriplett.org> > > > > There is a new branch yinghai.2011.05.24a on: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > > > Or will be as soon as kernel.org updates its mirrors. > > > > I am not sure I could call this "clean", but it does revert that commit > > and 11 of the subsequent commits that depend on it. It does build, > > and I will test it once my currently running tests complete. > > yes, with those revert, there is no delay in 10 times booting. Unfortunately, there are rcutorture test failures with the revert... Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 22:34 ` Paul E. McKenney @ 2011-05-25 22:49 ` Yinghai Lu 2011-05-26 1:13 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-25 22:49 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/25/2011 03:34 PM, Paul E. McKenney wrote: > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: >>> There is a new branch yinghai.2011.05.24a on: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git >>> >>> Or will be as soon as kernel.org updates its mirrors. >>> >>> I am not sure I could call this "clean", but it does revert that commit >>> and 11 of the subsequent commits that depend on it. It does build, >>> and I will test it once my currently running tests complete. >> >> yes, with those revert, there is no delay in 10 times booting. > > Unfortunately, there are rcutorture test failures with the revert... confused. what is the next? Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-25 22:49 ` Yinghai Lu @ 2011-05-26 1:13 ` Paul E. McKenney 2011-05-26 1:30 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-26 1:13 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Wed, May 25, 2011 at 03:49:25PM -0700, Yinghai Lu wrote: > On 05/25/2011 03:34 PM, Paul E. McKenney wrote: > > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: > >>> There is a new branch yinghai.2011.05.24a on: > >>> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > >>> > >>> Or will be as soon as kernel.org updates its mirrors. > >>> > >>> I am not sure I could call this "clean", but it does revert that commit > >>> and 11 of the subsequent commits that depend on it. It does build, > >>> and I will test it once my currently running tests complete. > >> > >> yes, with those revert, there is no delay in 10 times booting. > > > > Unfortunately, there are rcutorture test failures with the revert... > > confused. Given what I had to do to generate the revert, not exactly a surprise, I am afraid. Just means that the resulting RCU sometimes fails to wait for all pre-existing readers, and rcutorture catches it. > what is the next? 1. I send you a patch that I hope will fix the softlockup you saw. I am testing this. 2. I am working on more detailed instrumentation, and will send a patch on that. 3. If time allows, break down the operations RCU is doing and test them in isolation. Other thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 1:13 ` Paul E. McKenney @ 2011-05-26 1:30 ` Paul E. McKenney 2011-05-26 6:13 ` Ingo Molnar 2011-05-26 15:08 ` Yinghai Lu 0 siblings, 2 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-26 1:30 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Wed, May 25, 2011 at 06:13:10PM -0700, Paul E. McKenney wrote: > On Wed, May 25, 2011 at 03:49:25PM -0700, Yinghai Lu wrote: > > On 05/25/2011 03:34 PM, Paul E. McKenney wrote: > > > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: > > >>> There is a new branch yinghai.2011.05.24a on: > > >>> > > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > >>> > > >>> Or will be as soon as kernel.org updates its mirrors. > > >>> > > >>> I am not sure I could call this "clean", but it does revert that commit > > >>> and 11 of the subsequent commits that depend on it. It does build, > > >>> and I will test it once my currently running tests complete. > > >> > > >> yes, with those revert, there is no delay in 10 times booting. > > > > > > Unfortunately, there are rcutorture test failures with the revert... > > > > confused. > > Given what I had to do to generate the revert, not exactly a surprise, > I am afraid. Just means that the resulting RCU sometimes fails to > wait for all pre-existing readers, and rcutorture catches it. > > > what is the next? > > 1. I send you a patch that I hope will fix the softlockup > you saw. I am testing this. > > 2. I am working on more detailed instrumentation, and will > send a patch on that. > > 3. If time allows, break down the operations RCU is doing > and test them in isolation. > > Other thoughts? And here is patch #1. Could you please try applying this on top of Peter Zijlstra's patch to see if it gets rid of the softlockups you saw? Thanx, Paul ------------------------------------------------------------------------ rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can result in softlockup warnings. Because some of RCU's kthreads can legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE state in order to avoid those warnings. Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index a1a8bb6..40aab8d 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1647,6 +1647,7 @@ static int __cpuinit rcu_spawn_one_cpu_kthread(int cpu) if (IS_ERR(t)) return PTR_ERR(t); kthread_bind(t, cpu); + set_task_state(t, TASK_INTERRUPTIBLE); per_cpu(rcu_cpu_kthread_cpu, cpu) = cpu; WARN_ON_ONCE(per_cpu(rcu_cpu_kthread_task, cpu) != NULL); per_cpu(rcu_cpu_kthread_task, cpu) = t; @@ -1754,6 +1755,7 @@ static int __cpuinit rcu_spawn_one_node_kthread(struct rcu_state *rsp, if (IS_ERR(t)) return PTR_ERR(t); raw_spin_lock_irqsave(&rnp->lock, flags); + set_task_state(t, TASK_INTERRUPTIBLE); rnp->node_kthread_task = t; raw_spin_unlock_irqrestore(&rnp->lock, flags); sp.sched_priority = 99; diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 049f278..a767b7d 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -1295,6 +1295,7 @@ static int __cpuinit rcu_spawn_one_boost_kthread(struct rcu_state *rsp, if (IS_ERR(t)) return PTR_ERR(t); raw_spin_lock_irqsave(&rnp->lock, flags); + set_task_state(t, TASK_INTERRUPTIBLE); rnp->boost_kthread_task = t; raw_spin_unlock_irqrestore(&rnp->lock, flags); sp.sched_priority = RCU_KTHREAD_PRIO; ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 1:30 ` Paul E. McKenney @ 2011-05-26 6:13 ` Ingo Molnar 2011-05-26 14:25 ` Paul E. McKenney 2011-05-26 15:08 ` Yinghai Lu 1 sibling, 1 reply; 45+ messages in thread From: Ingo Molnar @ 2011-05-26 6:13 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > result in softlockup warnings. Because some of RCU's kthreads can > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > state in order to avoid those warnings. > > Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> this should also solve load average artifacts - do not TASK_UNINTERRUPTIBLE tasks skew the load upwards? Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 6:13 ` Ingo Molnar @ 2011-05-26 14:25 ` Paul E. McKenney 2011-05-26 17:43 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-26 14:25 UTC (permalink / raw) To: Ingo Molnar; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx On Thu, May 26, 2011 at 08:13:48AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > > result in softlockup warnings. Because some of RCU's kthreads can > > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > > state in order to avoid those warnings. > > > > Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > this should also solve load average artifacts - do not > TASK_UNINTERRUPTIBLE tasks skew the load upwards? Quite possibly -- in this case the artifacts would appear just after boot, and would disappear as soon as the RCU kthreads had something to do. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 14:25 ` Paul E. McKenney @ 2011-05-26 17:43 ` Paul E. McKenney 2011-05-26 20:26 ` Ingo Molnar 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-26 17:43 UTC (permalink / raw) To: Ingo Molnar; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx On Thu, May 26, 2011 at 07:25:53AM -0700, Paul E. McKenney wrote: > On Thu, May 26, 2011 at 08:13:48AM +0200, Ingo Molnar wrote: > > > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > > > > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > > > result in softlockup warnings. Because some of RCU's kthreads can > > > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > > > state in order to avoid those warnings. > > > > > > Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > this should also solve load average artifacts - do not > > TASK_UNINTERRUPTIBLE tasks skew the load upwards? > > Quite possibly -- in this case the artifacts would appear just after > boot, and would disappear as soon as the RCU kthreads had something > to do. By the way, how would you like to proceed with the fixes thus far? I have put them on -rcu for -next testing, FWIW. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 17:43 ` Paul E. McKenney @ 2011-05-26 20:26 ` Ingo Molnar 0 siblings, 0 replies; 45+ messages in thread From: Ingo Molnar @ 2011-05-26 20:26 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Yinghai Lu, linux-kernel, mingo, hpa, tglx * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > this should also solve load average artifacts - do not > > > TASK_UNINTERRUPTIBLE tasks skew the load upwards? > > > > Quite possibly -- in this case the artifacts would appear just > > after boot, and would disappear as soon as the RCU kthreads had > > something to do. > > By the way, how would you like to proceed with the fixes thus far? > I have put them on -rcu for -next testing, FWIW. I'd suggest to push them to me ASAP, they should get to Linus before -rc1. Thanks, Ingo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 1:30 ` Paul E. McKenney 2011-05-26 6:13 ` Ingo Molnar @ 2011-05-26 15:08 ` Yinghai Lu 2011-05-26 16:28 ` Paul E. McKenney 1 sibling, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-26 15:08 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Wed, May 25, 2011 at 6:30 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Wed, May 25, 2011 at 06:13:10PM -0700, Paul E. McKenney wrote: >> On Wed, May 25, 2011 at 03:49:25PM -0700, Yinghai Lu wrote: >> > On 05/25/2011 03:34 PM, Paul E. McKenney wrote: >> > > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: >> > >>> There is a new branch yinghai.2011.05.24a on: >> > >>> >> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git >> > >>> >> > >>> Or will be as soon as kernel.org updates its mirrors. >> > >>> >> > >>> I am not sure I could call this "clean", but it does revert that commit >> > >>> and 11 of the subsequent commits that depend on it. It does build, >> > >>> and I will test it once my currently running tests complete. >> > >> >> > >> yes, with those revert, there is no delay in 10 times booting. >> > > >> > > Unfortunately, there are rcutorture test failures with the revert... >> > >> > confused. >> >> Given what I had to do to generate the revert, not exactly a surprise, >> I am afraid. Just means that the resulting RCU sometimes fails to >> wait for all pre-existing readers, and rcutorture catches it. >> >> > what is the next? >> >> 1. I send you a patch that I hope will fix the softlockup >> you saw. I am testing this. >> >> 2. I am working on more detailed instrumentation, and will >> send a patch on that. >> >> 3. If time allows, break down the operations RCU is doing >> and test them in isolation. >> >> Other thoughts? > > And here is patch #1. Could you please try applying this on top of > Peter Zijlstra's patch to see if it gets rid of the softlockups you saw? > > Thanx, Paul > > ------------------------------------------------------------------------ > > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > result in softlockup warnings. Because some of RCU's kthreads can > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > state in order to avoid those warnings. > Yes, it fixes the lock up warning. Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 15:08 ` Yinghai Lu @ 2011-05-26 16:28 ` Paul E. McKenney 2011-05-28 1:04 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-26 16:28 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Thu, May 26, 2011 at 08:08:26AM -0700, Yinghai Lu wrote: > On Wed, May 25, 2011 at 6:30 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Wed, May 25, 2011 at 06:13:10PM -0700, Paul E. McKenney wrote: > >> On Wed, May 25, 2011 at 03:49:25PM -0700, Yinghai Lu wrote: > >> > On 05/25/2011 03:34 PM, Paul E. McKenney wrote: > >> > > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: > >> > >>> There is a new branch yinghai.2011.05.24a on: > >> > >>> > >> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > >> > >>> > >> > >>> Or will be as soon as kernel.org updates its mirrors. > >> > >>> > >> > >>> I am not sure I could call this "clean", but it does revert that commit > >> > >>> and 11 of the subsequent commits that depend on it. It does build, > >> > >>> and I will test it once my currently running tests complete. > >> > >> > >> > >> yes, with those revert, there is no delay in 10 times booting. > >> > > > >> > > Unfortunately, there are rcutorture test failures with the revert... > >> > > >> > confused. > >> > >> Given what I had to do to generate the revert, not exactly a surprise, > >> I am afraid. Just means that the resulting RCU sometimes fails to > >> wait for all pre-existing readers, and rcutorture catches it. > >> > >> > what is the next? > >> > >> 1. I send you a patch that I hope will fix the softlockup > >> you saw. I am testing this. > >> > >> 2. I am working on more detailed instrumentation, and will > >> send a patch on that. > >> > >> 3. If time allows, break down the operations RCU is doing > >> and test them in isolation. > >> > >> Other thoughts? > > > > And here is patch #1. Could you please try applying this on top of > > Peter Zijlstra's patch to see if it gets rid of the softlockups you saw? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > > result in softlockup warnings. Because some of RCU's kthreads can > > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > > state in order to avoid those warnings. > > Yes, it fixes the lock up warning. Very good, I have added your Tested-by. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-26 16:28 ` Paul E. McKenney @ 2011-05-28 1:04 ` Paul E. McKenney 2011-05-28 4:03 ` Yinghai Lu 0 siblings, 1 reply; 45+ messages in thread From: Paul E. McKenney @ 2011-05-28 1:04 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Thu, May 26, 2011 at 09:28:02AM -0700, Paul E. McKenney wrote: > On Thu, May 26, 2011 at 08:08:26AM -0700, Yinghai Lu wrote: > > On Wed, May 25, 2011 at 6:30 PM, Paul E. McKenney > > <paulmck@linux.vnet.ibm.com> wrote: > > > On Wed, May 25, 2011 at 06:13:10PM -0700, Paul E. McKenney wrote: > > >> On Wed, May 25, 2011 at 03:49:25PM -0700, Yinghai Lu wrote: > > >> > On 05/25/2011 03:34 PM, Paul E. McKenney wrote: > > >> > > On Wed, May 25, 2011 at 03:15:50PM -0700, Yinghai Lu wrote: > > >> > >>> There is a new branch yinghai.2011.05.24a on: > > >> > >>> > > >> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > > >> > >>> > > >> > >>> Or will be as soon as kernel.org updates its mirrors. > > >> > >>> > > >> > >>> I am not sure I could call this "clean", but it does revert that commit > > >> > >>> and 11 of the subsequent commits that depend on it. It does build, > > >> > >>> and I will test it once my currently running tests complete. > > >> > >> > > >> > >> yes, with those revert, there is no delay in 10 times booting. > > >> > > > > >> > > Unfortunately, there are rcutorture test failures with the revert... > > >> > > > >> > confused. > > >> > > >> Given what I had to do to generate the revert, not exactly a surprise, > > >> I am afraid. Just means that the resulting RCU sometimes fails to > > >> wait for all pre-existing readers, and rcutorture catches it. > > >> > > >> > what is the next? > > >> > > >> 1. I send you a patch that I hope will fix the softlockup > > >> you saw. I am testing this. > > >> > > >> 2. I am working on more detailed instrumentation, and will > > >> send a patch on that. > > >> > > >> 3. If time allows, break down the operations RCU is doing > > >> and test them in isolation. > > >> > > >> Other thoughts? > > > > > > And here is patch #1. Could you please try applying this on top of > > > Peter Zijlstra's patch to see if it gets rid of the softlockups you saw? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state > > > > > > Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can > > > result in softlockup warnings. Because some of RCU's kthreads can > > > legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE > > > state in order to avoid those warnings. > > > > Yes, it fixes the lock up warning. > > Very good, I have added your Tested-by. And, after having repeatedly shot myself in the foot trying to make an all-singing all-dancing RCU grace-period latency measurement tool, I fell back to simply measuring the RCU grace-period latency during the time that memory_dev_init() is running. This assumes that the grace periods are started using synchronize_rcu() -- if they are instead being started using call_rcu(), I can adapt to that as well. Please accept my apologies for the delay... Thanx, Paul ------------------------------------------------------------------------ diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 3da6a43..f877cf2 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -23,6 +23,7 @@ #include <linux/mutex.h> #include <linux/stat.h> #include <linux/slab.h> +#include <linux/rcupdate.h> #include <asm/atomic.h> #include <asm/uaccess.h> @@ -647,6 +648,7 @@ int __init memory_dev_init(void) int err; unsigned long block_sz; + trace_rcu_gp_latency_start(); memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops; ret = sysdev_class_register(&memory_sysdev_class); if (ret) @@ -680,5 +682,6 @@ int __init memory_dev_init(void) out: if (ret) printk(KERN_ERR "%s() failed: %d\n", __func__, ret); + trace_rcu_gp_latency_stop(); return ret; } diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index fb2933d..a4abf8b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -77,6 +77,8 @@ struct rcu_head { /* Exported common interfaces */ extern void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu)); +void trace_rcu_gp_latency_start(void); +void trace_rcu_gp_latency_stop(void); extern void synchronize_sched(void); extern void rcu_barrier_bh(void); extern void rcu_barrier_sched(void); diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c index 40d9ed2..58629b5 100644 --- a/kernel/rcutorture.c +++ b/kernel/rcutorture.c @@ -887,6 +887,8 @@ rcu_torture_writer(void *arg) cur_ops->deferred_free(old_rp); } rcutorture_record_progress(++rcu_torture_current_version); + if (rcu_torture_current_version == 40) + trace_rcu_gp_latency_stop(); oldbatch = cur_ops->completed(); rcu_stutter_wait("rcu_torture_writer"); } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP); @@ -1432,6 +1434,7 @@ rcu_torture_init(void) &sched_ops, &sched_sync_ops, &sched_expedited_ops, }; mutex_lock(&fullstop_mutex); + trace_rcu_gp_latency_start(); /* Process args and tell the world that the torturer is on the job. */ for (i = 0; i < ARRAY_SIZE(torture_ops); i++) { diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 8b4b3da..db43a3d 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1882,6 +1882,22 @@ void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) } EXPORT_SYMBOL_GPL(call_rcu_bh); +static int trace_rcu_gp_latency = 0; + +void trace_rcu_gp_latency_start(void) +{ + printk(KERN_INFO "Starting RCU latency diagnostics\n"); + trace_rcu_gp_latency = 1; +} +EXPORT_SYMBOL_GPL(trace_rcu_gp_latency_start); + +void trace_rcu_gp_latency_stop(void) +{ + trace_rcu_gp_latency = 0; + printk(KERN_INFO "Ending RCU latency diagnostics\n"); +} +EXPORT_SYMBOL_GPL(trace_rcu_gp_latency_stop); + /** * synchronize_sched - wait until an rcu-sched grace period has elapsed. * @@ -1908,10 +1924,13 @@ EXPORT_SYMBOL_GPL(call_rcu_bh); void synchronize_sched(void) { struct rcu_synchronize rcu; + ktime_t start, finish; + static int i; if (rcu_blocking_is_gp()) return; + start = ktime_get(); init_rcu_head_on_stack(&rcu.head); init_completion(&rcu.completion); /* Will wake me after RCU finished. */ @@ -1919,6 +1938,14 @@ void synchronize_sched(void) /* Wait for it. */ wait_for_completion(&rcu.completion); destroy_rcu_head_on_stack(&rcu.head); + finish = ktime_get(); + if (ACCESS_ONCE(trace_rcu_gp_latency)) { + printk(KERN_ALERT + "synchronize_sched() duration %d microseconds\n", + (int)ktime_us_delta(finish, start)); + if (i++ < 10) + dump_stack(); + } } EXPORT_SYMBOL_GPL(synchronize_sched); ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-28 1:04 ` Paul E. McKenney @ 2011-05-28 4:03 ` Yinghai Lu 2011-05-28 6:38 ` Paul E. McKenney 0 siblings, 1 reply; 45+ messages in thread From: Yinghai Lu @ 2011-05-28 4:03 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel, mingo, hpa, tglx, mingo On 05/27/2011 06:04 PM, Paul E. McKenney wrote: > > And, after having repeatedly shot myself in the foot trying to make > an all-singing all-dancing RCU grace-period latency measurement tool, > I fell back to simply measuring the RCU grace-period latency during > the time that memory_dev_init() is running. This assumes that the > grace periods are started using synchronize_rcu() -- if they are instead > being started using call_rcu(), I can adapt to that as well. > [ 31.635137] cpu_dev_init done [ 31.635320] Starting RCU latency diagnostics [ 71.605662] Ending RCU latency diagnostics [ 71.605924] memory_dev_init done Thanks Yinghai ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-28 4:03 ` Yinghai Lu @ 2011-05-28 6:38 ` Paul E. McKenney 0 siblings, 0 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-28 6:38 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Fri, May 27, 2011 at 09:03:31PM -0700, Yinghai Lu wrote: > On 05/27/2011 06:04 PM, Paul E. McKenney wrote: > > > > And, after having repeatedly shot myself in the foot trying to make > > an all-singing all-dancing RCU grace-period latency measurement tool, > > I fell back to simply measuring the RCU grace-period latency during > > the time that memory_dev_init() is running. This assumes that the > > grace periods are started using synchronize_rcu() -- if they are instead > > being started using call_rcu(), I can adapt to that as well. > > > [ 31.635137] cpu_dev_init done > [ 31.635320] Starting RCU latency diagnostics > [ 71.605662] Ending RCU latency diagnostics > [ 71.605924] memory_dev_init done Thank you! Strange... Could you please try the following, which replaces the previous patch? Thanx, Paul ------------------------------------------------------------------------ diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 3da6a43..f877cf2 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -23,6 +23,7 @@ #include <linux/mutex.h> #include <linux/stat.h> #include <linux/slab.h> +#include <linux/rcupdate.h> #include <asm/atomic.h> #include <asm/uaccess.h> @@ -647,6 +648,7 @@ int __init memory_dev_init(void) int err; unsigned long block_sz; + trace_rcu_gp_latency_start(); memory_sysdev_class.kset.uevent_ops = &memory_uevent_ops; ret = sysdev_class_register(&memory_sysdev_class); if (ret) @@ -680,5 +682,6 @@ int __init memory_dev_init(void) out: if (ret) printk(KERN_ERR "%s() failed: %d\n", __func__, ret); + trace_rcu_gp_latency_stop(); return ret; } diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index fb2933d..a4abf8b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -77,6 +77,8 @@ struct rcu_head { /* Exported common interfaces */ extern void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu)); +void trace_rcu_gp_latency_start(void); +void trace_rcu_gp_latency_stop(void); extern void synchronize_sched(void); extern void rcu_barrier_bh(void); extern void rcu_barrier_sched(void); diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c index 40d9ed2..58629b5 100644 --- a/kernel/rcutorture.c +++ b/kernel/rcutorture.c @@ -887,6 +887,8 @@ rcu_torture_writer(void *arg) cur_ops->deferred_free(old_rp); } rcutorture_record_progress(++rcu_torture_current_version); + if (rcu_torture_current_version == 40) + trace_rcu_gp_latency_stop(); oldbatch = cur_ops->completed(); rcu_stutter_wait("rcu_torture_writer"); } while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP); @@ -1432,6 +1434,7 @@ rcu_torture_init(void) &sched_ops, &sched_sync_ops, &sched_expedited_ops, }; mutex_lock(&fullstop_mutex); + trace_rcu_gp_latency_start(); /* Process args and tell the world that the torturer is on the job. */ for (i = 0; i < ARRAY_SIZE(torture_ops); i++) { diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 8b4b3da..49c254b 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -889,6 +889,22 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags) raw_spin_unlock_irqrestore(&rsp->onofflock, flags); } +static int trace_rcu_gp_latency = 0; + +void trace_rcu_gp_latency_start(void) +{ + printk(KERN_INFO "Starting RCU latency diagnostics\n"); + trace_rcu_gp_latency = 1; +} +EXPORT_SYMBOL_GPL(trace_rcu_gp_latency_start); + +void trace_rcu_gp_latency_stop(void) +{ + trace_rcu_gp_latency = 0; + printk(KERN_INFO "Ending RCU latency diagnostics\n"); +} +EXPORT_SYMBOL_GPL(trace_rcu_gp_latency_stop); + /* * Report a full set of quiescent states to the specified rcu_state * data structure. This involves cleaning up after the prior grace @@ -909,6 +925,9 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) */ smp_mb(); /* See above block comment. */ gp_duration = jiffies - rsp->gp_start; + if (ACCESS_ONCE(trace_rcu_gp_latency)) + printk(KERN_ALERT + "Grace period duration %lu jiffies\n", gp_duration); if (gp_duration > rsp->gp_max) rsp->gp_max = gp_duration; rsp->completed = rsp->gpnum; @@ -1803,7 +1822,10 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu), { unsigned long flags; struct rcu_data *rdp; + static int i; + if (ACCESS_ONCE(trace_rcu_gp_latency) && i++ < 10) + dump_stack(); debug_rcu_head_queue(head); head->func = func; head->next = NULL; @@ -1908,10 +1930,13 @@ EXPORT_SYMBOL_GPL(call_rcu_bh); void synchronize_sched(void) { struct rcu_synchronize rcu; + ktime_t start, finish; + static int i; if (rcu_blocking_is_gp()) return; + start = ktime_get(); init_rcu_head_on_stack(&rcu.head); init_completion(&rcu.completion); /* Will wake me after RCU finished. */ @@ -1919,6 +1944,14 @@ void synchronize_sched(void) /* Wait for it. */ wait_for_completion(&rcu.completion); destroy_rcu_head_on_stack(&rcu.head); + finish = ktime_get(); + if (ACCESS_ONCE(trace_rcu_gp_latency)) { + printk(KERN_ALERT + "synchronize_sched() duration %d microseconds\n", + (int)ktime_us_delta(finish, start)); + if (i++ < 10) + dump_stack(); + } } EXPORT_SYMBOL_GPL(synchronize_sched); ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-23 22:01 ` Yinghai Lu 2011-05-23 22:55 ` Yinghai Lu @ 2011-05-24 1:12 ` Paul E. McKenney 1 sibling, 0 replies; 45+ messages in thread From: Paul E. McKenney @ 2011-05-24 1:12 UTC (permalink / raw) To: Yinghai Lu; +Cc: linux-kernel, mingo, hpa, tglx, mingo On Mon, May 23, 2011 at 03:01:24PM -0700, Yinghai Lu wrote: > On 05/23/2011 02:25 PM, Paul E. McKenney wrote: > > On Mon, May 23, 2011 at 01:14:22PM -0700, Yinghai Lu wrote: > >> On 05/21/2011 07:08 AM, Paul E. McKenney wrote: > >>> On Sat, May 21, 2011 at 06:18:44AM -0700, Paul E. McKenney wrote: > >>>> On Fri, May 20, 2011 at 05:02:40PM -0700, Yinghai Lu wrote: > >>>>> On 05/20/2011 04:49 PM, Paul E. McKenney wrote: > >>>>>> On Fri, May 20, 2011 at 04:16:28PM -0700, Yinghai Lu wrote: > >>>>> ... > >>>>>>> > >>>>>>> the same one i sent out before, but let DEBUG_LOCKING_API_SELFTESTS disabled. > >>>>>> > >>>>>> OK, just to make sure I understand... You are compiling exactly the > >>>>>> same kernel source tree with exactly the same .config, just with two > >>>>>> different versions of gcc, correct? > >>>>> yes. > >>>>>> > >>>>>> If so, it is quite possible that the slow one is the correct one. :-/ > >>>>> yeah, new version always have problem. > >>>>> > >>>>> looks like opensuse11.3 has 4.5.0 and fedora14 has 4.5.1 > >>>> > >>>> OK, so fedora14 is the fast one (4.5.1) and opensuse11.3 is the slow > >>>> one (4.5.0), correct? > >>> > >>> And does commit c7a3786030 help? This commit (from Peter Zijlstra) > >>> tidied up RCU kthreads' scheduler interactions. The patch is below, > >>> though it is probably more convenient to pull it from the rcu/next > >>> branch of: > >>> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git > >>> > > > > Thank you for testing this! > > it's ok. don't want to see our servers have problem with newer kernel. > > > This is with the same config that you emailed out on May 12th? > > yes. > > > > > In particular, CONFIG_TREE_RCU=y? > > > >> [ 337.132517] INFO: task rcun0:8 blocked for more than 120 seconds. > >> [ 337.133238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> [ 337.160396] rcun0 D 0000000000000000 0 8 2 0x00000000 > >> [ 337.161232] ffff882070d3fe90 0000000000000046 ffff882070d3e000 0000000000004000 > >> [ 337.161291] 00000000001d1f80 ffff882070d3ffd8 00000000001d1f80 ffff882070d3ffd8 > >> [ 337.161348] 0000000000004000 00000000001d1f80 ffff882070d18000 ffff882070d422b0 > >> [ 337.161404] Call Trace: > >> [ 337.161433] [<ffffffff810afab6>] ? __lock_release+0x166/0x16f > >> [ 337.161459] [<ffffffff81c1dae1>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > >> [ 337.161486] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > >> [ 337.161512] [<ffffffff810add8a>] ? trace_hardirqs_on+0xd/0xf > >> [ 337.161533] [<ffffffff810ce633>] ? rcu_cpu_kthread_should_stop+0x137/0x137 > >> [ 337.161558] [<ffffffff81099e41>] kthread+0x8c/0xa8 > >> [ 337.161584] [<ffffffff81c257d4>] kernel_thread_helper+0x4/0x10 > >> [ 337.161606] [<ffffffff81c1dd80>] ? retint_restore_args+0xe/0xe > >> [ 337.161627] [<ffffffff81099db5>] ? __init_kthread_worker+0x5b/0x5b > >> [ 337.161645] [<ffffffff81c257d0>] ? gs_change+0xb/0xb > >> [ 337.161651] no locks held by rcun0/8. > > > > This is quite surprising. The "rcun" kthreads invoke rcu_node_kthread(), > > which does not call rcu_cpu_kthread_should_stop(). > > > > But perhaps the stack backtrace got confused. > > > > Could you please try the following diagnostic patch to help me work out > > where the rcun threads are getting stuck? And this is also strange... > [ 275.679636] RAMDISK: gzip image found at block 0 > [ 277.504381] rcun ffffffff82440b00 starting wait for work. > [ 277.504693] rcun ffffffff82440b00 completed wait for work. > [ 277.504951] rcun ffffffff82440b00 initiating boost. > [ 277.515920] rcun ffffffff82440b00 completed boost. > [ 277.516157] rcun ffffffff82440b00 awaking rcuc2. > [ 277.535818] rcun ffffffff82440b00 awakened rcuc2. > [ 277.536075] rcun ffffffff82440b00 starting wait for work. > [ 277.604609] EXT3-fs: barriers not enabled > [ 277.605278] kjournald starting. Commit interval 5 seconds > [ 277.605473] EXT3-fs (ram0): warning: maximal mount count reached, running e2fsck is recommended > [ 277.605493] EXT3-fs (ram0): using internal journal > [ 277.605505] EXT3-fs (ram0): mounted filesystem with writeback data mode > [ 277.605555] VFS: Mounted root (ext3 filesystem) on device 1:0. > [ 277.605600] async_waiting @ 1 > [ 277.605604] async_continuing @ 1 after 0 usec > [ 277.669722] Freeing unused kernel memory: 2892k freed > INIT: version 2.86 booting > System Boot Control: Running /etc/init.d/boot > Mounting procfs at /proc done > Mounting sysfs at /sys done > Mounting debugfs at /sys/kernel/debug done > Mounting tmpfs at /dev done > Initializing /dev done > Mounting devpts at /dev/pts done > Boot logging started on /dev/ttyS0(/dev/console) at Wed May 25 13:43:55 2011 > FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory > Setting up the hardware clockmodprobe: FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory > > hwclock: With --noadjfile, you must specify either --utc or --localtime > ^[[8[ 277.836052] udevd (18674): /proc/18674/oom_adj is deprecated, please use /proc/18674/oom_score_adj instead. > fa[ 277.845689] udevd version 128 started > iled > Disabling IP forwarding done > done > Starting udevd: done > Loading drivers, configuring devices: [ 278.019705] rcun ffffffff82441300 starting wait for work. > [ 278.020057] rcun ffffffff82441300 completed wait for work. > [ 278.020314] rcun ffffffff82441300 initiating boost. > [ 278.035052] rcun ffffffff82441300 completed boost. > [ 278.035383] rcun ffffffff82441300 awaking rcuc141. > [ 278.054945] rcun ffffffff82441300 awakened rcuc141. > [ 278.055197] rcun ffffffff82441300 starting wait for work. > [ 278.159361] rcun ffffffff82441100 starting wait for work. > [ 278.159666] rcun ffffffff82441100 completed wait for work. > [ 278.159949] rcun ffffffff82441100 initiating boost. > [ 278.174805] rcun ffffffff82441100 completed boost. > [ 278.175044] rcun ffffffff82441100 awaking rcuc111. > [ 278.194728] rcun ffffffff82441100 awakened rcuc111. > [ 278.194987] rcun ffffffff82441100 starting wait for work. > [ 278.303039] rcun ffffffff82440c00 starting wait for work. > [ 278.303323] rcun ffffffff82440c00 completed wait for work. > [ 278.303568] rcun ffffffff82440c00 initiating boost. > [ 278.314519] rcun ffffffff82440c00 completed boost. > [ 278.314750] rcun ffffffff82440c00 awaking rcuc31. > [ 278.334455] rcun ffffffff82440c00 awakened rcuc31. > [ 278.334687] rcun ffffffff82440c00 starting wait for work. > [ 278.498822] rcun ffffffff82441400 starting wait for work. > [ 278.499131] rcun ffffffff82441400 completed wait for work. > [ 278.499412] rcun ffffffff82441400 initiating boost. > [ 278.514295] rcun ffffffff82441400 completed boost. > [ 278.514632] rcun ffffffff82441400 awaking rcuc151. > [ 278.534082] rcun ffffffff82441400 awakened rcuc151. > [ 278.534338] rcun ffffffff82441400 starting wait for work. > [ 278.686359] rcun ffffffff82440e00 starting wait for work. > [ 278.686670] rcun ffffffff82440e00 completed wait for work. > [ 278.686927] rcun ffffffff82440e00 initiating boost. > [ 278.703910] rcun ffffffff82440e00 completed boost. > [ 278.704148] rcun ffffffff82440e00 awaking rcuc51. > [ 278.723778] rcun ffffffff82440e00 awakened rcuc51. > [ 278.724036] rcun ffffffff82440e00 starting wait for work. > [ 278.762564] rcun ffffffff82440e00 completed wait for work. > [ 278.762863] rcun ffffffff82440e00 initiating boost. > [ 278.763540] rcun ffffffff82440e00 completed boost. > [ 278.763782] rcun ffffffff82440e00 awaking rcuc51. > [ 278.764012] rcun ffffffff82440e00 awakened rcuc51. > [ 278.783768] rcun ffffffff82440e00 starting wait for work. > [ 278.784047] rcun ffffffff82440e00 completed wait for work. > [ 278.803684] rcun ffffffff82440e00 initiating boost. > [ 278.803937] rcun ffffffff82440e00 completed boost. > [ 278.823598] rcun ffffffff82440e00 awaking rcuc51. > [ 278.823851] rcun ffffffff82440e00 awakened rcuc51. > [ 278.843603] rcun ffffffff82440e00 starting wait for work. > [ 278.922171] rcun ffffffff82441300 completed wait for work. > [ 278.922498] rcun ffffffff82441300 initiating boost. > [ 278.922762] rcun ffffffff82441300 completed boost. > [ 278.934066] rcun ffffffff82441300 awaking rcuc131. > [ 278.934376] rcun ffffffff82441300 awakened rcuc131. > [ 278.953529] rcun ffffffff82441300 starting wait for work. > [ 279.017973] rcun ffffffff82440e00 completed wait for work. > [ 279.018288] rcun ffffffff82440e00 initiating boost. > [ 279.018515] rcun ffffffff82440e00 completed boost. > [ 279.033336] rcun ffffffff82440e00 awaking rcuc51. > [ 279.034041] rcun ffffffff82440e00 awakened rcuc51. > [ 279.053188] rcun ffffffff82440e00 starting wait for work. > [ 279.149846] rcun ffffffff82441300 completed wait for work. > [ 279.150185] rcun ffffffff82441300 initiating boost. > [ 279.150438] rcun ffffffff82441300 completed boost. > [ 279.163080] rcun ffffffff82441300 awaking rcuc131. > [ 279.163341] rcun ffffffff82441300 awakened rcuc131. > [ 279.183285] rcun ffffffff82441300 starting wait for work. > [ 279.313608] rcun ffffffff82441300 completed wait for work. > [ 279.313983] rcun ffffffff82441300 initiating boost. > [ 279.314216] rcun ffffffff82441300 completed boost. > [ 279.332841] rcun ffffffff82441300 awaking rcuc131. > [ 279.333359] rcun ffffffff82441300 awakened rcuc131. > [ 279.352563] rcun ffffffff82441300 starting wait for work. > [ 279.409412] rcun ffffffff82441300 completed wait for work. > [ 279.409775] rcun ffffffff82441300 initiating boost. > [ 279.410057] rcun ffffffff82441300 completed boost. > [ 279.422561] rcun ffffffff82441300 awaking rcuc131. > [ 279.422810] rcun ffffffff82441300 awakened rcuc131. > [ 279.442473] rcun ffffffff82441300 starting wait for work. > [ 279.932452] rcun ffffffff82441100 completed wait for work. > [ 279.932806] rcun ffffffff82441100 initiating boost. > [ 279.933047] rcun ffffffff82441100 completed boost. > [ 279.952298] rcun ffffffff82441100 awaking rcuc110. > [ 279.952658] rcun ffffffff82441100 awakened rcuc110. > [ 279.971749] rcun ffffffff82441100 starting wait for work. > [ 279.972249] rcun ffffffff82441100 completed wait for work. > [ 279.991659] rcun ffffffff82441100 initiating boost. > [ 279.992066] rcun ffffffff82441100 completed boost. > [ 280.011403] rcun ffffffff82441100 awaking rcuc110. > [ 280.011658] rcun ffffffff82441100 awakened rcuc110. > [ 280.012070] rcun ffffffff82441100 starting wait for work. > [ 280.112094] rcun ffffffff82440b00 completed wait for work. > [ 280.112427] rcun ffffffff82440b00 initiating boost. > [ 280.112674] rcun ffffffff82440b00 completed boost. > [ 280.131375] rcun ffffffff82440b00 awaking rcuc11. > [ 280.131651] rcun ffffffff82440b00 awakened rcuc11. > [ 280.151524] rcun ffffffff82440b00 starting wait for work. > [ 280.459704] rcun ffffffff82440b00 completed wait for work. > [ 280.459997] rcun ffffffff82440b00 initiating boost. > [ 280.460228] rcun ffffffff82440b00 completed boost. > [ 280.470779] rcun ffffffff82440b00 awaking rcuc0. > [ 280.471062] rcun ffffffff82440b00 awakened rcuc0. > [ 280.490721] rcun ffffffff82440b00 starting wait for work. > [ 280.567316] rcun ffffffff82441400 completed wait for work. > [ 280.567647] rcun ffffffff82441400 initiating boost. > [ 280.567897] rcun ffffffff82441400 completed boost. > [ 280.580815] rcun ffffffff82441400 awaking rcuc151. > [ 280.581116] rcun ffffffff82441400 awakened rcuc151. > [ 280.600382] rcun ffffffff82441400 starting wait for work. > [ 280.695170] rcun ffffffff82440f00 starting wait for work. > [ 280.695506] rcun ffffffff82440f00 completed wait for work. > [ 280.695847] rcun ffffffff82440f00 initiating boost. > [ 280.710661] rcun ffffffff82440f00 completed boost. > [ 280.711207] rcun ffffffff82440f00 awaking rcuc71. > [ 280.730198] rcun ffffffff82440f00 awakened rcuc71. > [ 280.730443] rcun ffffffff82440f00 starting wait for work. > [ 281.601394] rcun ffffffff82440c00 completed wait for work. > [ 281.601753] rcun ffffffff82440c00 initiating boost. > [ 281.602004] rcun ffffffff82440c00 completed boost. > [ 281.618891] rcun ffffffff82440c00 awaking rcuc30. > [ 281.619164] rcun ffffffff82440c00 awakened rcuc30. > [ 281.638755] rcun ffffffff82440c00 starting wait for work. > [ 281.729334] rcun ffffffff82441300 completed wait for work. > [ 281.729661] rcun ffffffff82441300 initiating boost. > [ 281.729920] rcun ffffffff82441300 completed boost. > [ 281.748587] rcun ffffffff82441300 awaking rcuc131. > [ 281.748871] rcun ffffffff82441300 awakened rcuc131. > [ 281.768287] rcun ffffffff82441300 starting wait for work. > [ 281.905078] rcun ffffffff82440b00 completed wait for work. > [ 281.905380] rcun ffffffff82440b00 initiating boost. > [ 281.905623] rcun ffffffff82440b00 completed boost. > [ 281.918170] rcun ffffffff82440b00 awaking rcuc11. > [ 281.918450] rcun ffffffff82440b00 awakened rcuc11. > [ 281.938055] rcun ffffffff82440b00 starting wait for work. > [ 282.240380] rcun ffffffff82441300 completed wait for work. > [ 282.240667] rcun ffffffff82441300 initiating boost. > [ 282.240890] rcun ffffffff82441300 completed boost. > [ 282.257498] rcun ffffffff82441300 awaking rcuc130. > [ 282.257772] rcun ffffffff82441300 awakened rcuc130. > [ 282.277380] rcun ffffffff82441300 starting wait for work. > [ 282.304255] rcun ffffffff82441300 completed wait for work. > [ 282.304551] rcun ffffffff82441300 initiating boost. > [ 282.304792] rcun ffffffff82441300 completed boost. > [ 282.317376] rcun ffffffff82441300 awaking rcuc130. > [ 282.317639] rcun ffffffff82441300 awakened rcuc130. > [ 282.337291] rcun ffffffff82441300 starting wait for work. > [ 282.427834] rcun ffffffff82440e00 completed wait for work. > [ 282.428165] rcun ffffffff82440e00 initiating boost. > [ 282.428404] rcun ffffffff82440e00 completed boost. > [ 282.447168] rcun ffffffff82440e00 awaking rcuc50. > [ 282.447398] rcun ffffffff82440e00 awakened rcuc50. > [ 282.467022] rcun ffffffff82440e00 starting wait for work. > [ 282.543751] rcun ffffffff82441300 completed wait for work. > [ 282.544030] rcun ffffffff82441300 initiating boost. > [ 282.544262] rcun ffffffff82441300 completed boost. > [ 282.556969] rcun ffffffff82441300 awaking rcuc130. > [ 282.557221] rcun ffffffff82441300 awakened rcuc130. > [ 282.576959] rcun ffffffff82441300 starting wait for work. > [ 282.651510] rcun ffffffff82440e00 completed wait for work. > [ 282.651859] rcun ffffffff82440e00 initiating boost. > [ 282.652115] rcun ffffffff82440e00 completed boost. > [ 282.666799] rcun ffffffff82440e00 awaking rcuc50. > [ 282.667062] rcun ffffffff82440e00 awakened rcuc50. > [ 282.686638] rcun ffffffff82440e00 starting wait for work. > [ 283.469957] rcun ffffffff82440c00 completed wait for work. > [ 283.470235] rcun ffffffff82440c00 initiating boost. > [ 283.470457] rcun ffffffff82440c00 completed boost. > [ 283.485312] rcun ffffffff82440c00 awaking rcuc20. > [ 283.485548] rcun ffffffff82440c00 awakened rcuc20. > [ 283.505188] rcun ffffffff82440c00 starting wait for work. > [ 283.513893] rcun ffffffff82440c00 completed wait for work. > [ 283.525161] rcun ffffffff82440c00 initiating boost. > [ 283.525381] rcun ffffffff82440c00 completed boost. > [ 283.545051] rcun ffffffff82440c00 awaking rcuc20. > [ 283.545289] rcun ffffffff82440c00 awakened rcuc20. > [ 283.545497] rcun ffffffff82440c00 starting wait for work. > [ 283.577780] rcun ffffffff82440c00 completed wait for work. > [ 283.578067] rcun ffffffff82440c00 initiating boost. > [ 283.585133] rcun ffffffff82440c00 completed boost. > [ 283.585366] rcun ffffffff82440c00 awaking rcuc20. > [ 283.604979] rcun ffffffff82440c00 awakened rcuc20. > [ 283.605219] rcun ffffffff82440c00 starting wait for work. > [ 283.673621] rcun ffffffff82440c00 completed wait for work. > [ 283.673904] rcun ffffffff82440c00 initiating boost. > [ 283.674126] rcun ffffffff82440c00 completed boost. > [ 283.684951] rcun ffffffff82440c00 awaking rcuc20. > [ 283.685186] rcun ffffffff82440c00 awakened rcuc20. > [ 283.704835] rcun ffffffff82440c00 starting wait for work. > [ 283.721536] rcun ffffffff82440c00 completed wait for work. > [ 283.724733] rcun ffffffff82440c00 initiating boost. > [ 283.724974] rcun ffffffff82440c00 completed boost. > [ 283.744676] rcun ffffffff82440c00 awaking rcuc20. > [ 283.744921] rcun ffffffff82440c00 awakened rcuc20. > [ 283.745142] rcun ffffffff82440c00 starting wait for work. > [ 283.849306] rcun ffffffff82440c00 completed wait for work. > [ 283.849580] rcun ffffffff82440c00 initiating boost. > [ 283.849806] rcun ffffffff82440c00 completed boost. > [ 283.864625] rcun ffffffff82440c00 awaking rcuc20. > [ 283.864859] rcun ffffffff82440c00 awakened rcuc20. > [ 283.884509] rcun ffffffff82440c00 starting wait for work. > [ 283.897233] rcun ffffffff82440c00 completed wait for work. > [ 283.904500] rcun ffffffff82440c00 initiating boost. > [ 283.904740] rcun ffffffff82440c00 completed boost. > [ 283.924388] rcun ffffffff82440c00 awaking rcuc20. > [ 283.924639] rcun ffffffff82440c00 awakened rcuc20. > [ 283.924857] rcun ffffffff82440c00 starting wait for work. > [ 283.961137] rcun ffffffff82440c00 completed wait for work. > [ 283.961412] rcun ffffffff82440c00 initiating boost. > [ 283.964375] rcun ffffffff82440c00 completed boost. > [ 283.964585] rcun ffffffff82440c00 awaking rcuc20. > [ 283.984347] rcun ffffffff82440c00 awakened rcuc20. > [ 283.984580] rcun ffffffff82440c00 starting wait for work. > [ 284.064957] rcun ffffffff82440c00 completed wait for work. > [ 284.065249] rcun ffffffff82440c00 initiating boost. > [ 284.065462] rcun ffffffff82440c00 completed boost. > [ 284.084281] rcun ffffffff82440c00 awaking rcuc20. > [ 284.084506] rcun ffffffff82440c00 awakened rcuc20. > [ 284.104135] rcun ffffffff82440c00 starting wait for work. > [ 284.124914] rcun ffffffff82440c00 completed wait for work. > [ 284.125251] rcun ffffffff82440c00 initiating boost. > [ 284.125488] rcun ffffffff82440c00 completed boost. > [ 284.144394] rcun ffffffff82440c00 awaking rcuc20. > [ 284.144636] rcun ffffffff82440c00 awakened rcuc20. > [ 284.163985] rcun ffffffff82440c00 starting wait for work. > [ 284.352449] rcun ffffffff82440c00 completed wait for work. > [ 284.352722] rcun ffffffff82440c00 initiating boost. > [ 284.352965] rcun ffffffff82440c00 completed boost. > [ 284.363729] rcun ffffffff82440c00 awaking rcuc21. > [ 284.363952] rcun ffffffff82440c00 awakened rcuc21. > [ 284.383609] rcun ffffffff82440c00 starting wait for work. > [ 284.400355] rcun ffffffff82440c00 completed wait for work. > [ 284.403541] rcun ffffffff82440c00 initiating boost. > [ 284.403752] rcun ffffffff82440c00 completed boost. > [ 284.423478] rcun ffffffff82440c00 awaking rcuc21. > [ 284.423730] rcun ffffffff82440c00 awakened rcuc21. > [ 284.423961] rcun ffffffff82440c00 starting wait for work. > [ 284.464264] rcun ffffffff82440c00 completed wait for work. > [ 284.464531] rcun ffffffff82440c00 initiating boost. > [ 284.464739] rcun ffffffff82440c00 completed boost. > [ 284.483510] rcun ffffffff82440c00 awaking rcuc21. > [ 284.483725] rcun ffffffff82440c00 awakened rcuc21. > [ 284.503396] rcun ffffffff82440c00 starting wait for work. > [ 284.524166] rcun ffffffff82440c00 completed wait for work. > [ 284.524430] rcun ffffffff82440c00 initiating boost. > [ 284.524660] rcun ffffffff82440c00 completed boost. > [ 284.543410] rcun ffffffff82440c00 awaking rcuc21. > [ 284.543634] rcun ffffffff82440c00 awakened rcuc21. > [ 284.563295] rcun ffffffff82440c00 starting wait for work. > done > Loading required kernel modules done > Activating device mapper... > FATAL: Could not load /lib/modules/2.6.39-tip-yh-05892-gb7b703b-dirty/modules.dep: No such file or directory > failed > Starting MD Raid unused > Waiting for udev to settle... > Scanning for LVM volume groups... > File descriptor 3 left open > Reading all physical volumes. This may take a while... > Activating LVM volume groups... > File descriptor 3 left open > done > Waiting for /firmware > microcode . no more events > Checking file systems... > fsck 1.41.1 (01-Sep-2008) > Checking all file systems. done > done > Mounting local file systems... > /proc on /proc type proc (rw) > sysfs on /sys type sysfs (rw) > debugfs on /sys/kernel/debug type debugfs (rw) > udev on /dev type tmpfs (rw) > devpts on /dev/pts type devpts (rw,mode=0620,gid=5) > /firmware on /lib/firmware type tmpfs (rw) > microcode on /usr/lib/microcode type tmpfs (rw) done > Activating remaining swap-devices in /etc/fstab... done > Setting up linker cache (/etc/ld.so.cache) using ldconfig done > Creating /var/log/boot.msg done > Using boot-specified hostname 'lb-g5plus-1t-host' > Setting up hostname 'lb-g5plus-1t-host' done > Setting up loopback interface lo > lo IP address: 127.0.0.1/8 > IP address: 127.0.0.2/8 > done > System Boot Control: The system has been set up > Skipped features: boot.md > System Boot Control: Running /etc/init.d/boot.local done > INIT: Entering runlevel: 3 > Boot logging started on /dev/ttyS0(/dev/console) at Wed May 25 13:44:05 2011 > Master Resource Control: previous runlevel: N, switching to runlevel:3 > Starting D-Bus daemon done > Initializing random number generator done > [ 287.794818] rcun ffffffff82440f00 completed wait for work. > [ 287.795218] rcun ffffffff82440f00 initiating boost. > [ 287.795440] rcun ffffffff82440f00 completed boost. > [ 287.807922] rcun ffffffff82440f00 awaking rcuc71. > [ 287.808186] rcun ffffffff82440f00 awakened rcuc71. > [ 287.808195] rcun ffffffff82440f00 starting wait for work. > Starting syslog services done > [ 288.693183] rcun ffffffff82440e00 completed wait for work. > [ 288.693586] rcun ffffffff82440e00 initiating boost. > [ 288.693810] rcun ffffffff82440e00 completed boost. > [ 288.706108] rcun ffffffff82440e00 awaking rcuc50. > [ 288.706352] rcun ffffffff82440e00 awakened rcuc50. > [ 288.725895] rcun ffffffff82440e00 starting wait for work. > [ 288.726166] rcun ffffffff82440e00 completed wait for work. > [ 288.745842] rcun ffffffff82440e00 initiating boost. > [ 288.746067] rcun ffffffff82440e00 completed boost. > [ 288.765741] rcun ffffffff82440e00 awaking rcuc50. > [ 288.765967] rcun ffffffff82440e00 awakened rcuc50. > [ 288.766203] rcun ffffffff82440e00 starting wait for work. > Loading CPUFreq modules done > Starting HAL daemon done > Setting up (localfs) network interfaces: > lo > lo IP address: 127.0.0.1/8 > IP address: 127.0.0.2/8 done > [ 289.323903] rcun ffffffff82440c00 completed wait for work. > [ 289.324239] rcun ffffffff82440c00 initiating boost. > [ 289.324458] rcun ffffffff82440c00 completed boost. > [ 289.334891] rcun ffffffff82440c00 awaking rcuc20. > [ 289.335112] rcun ffffffff82440c00 awakened rcuc20. > [ 289.354785] rcun ffffffff82440c00 starting wait for work. > eth0 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) > No configuration found for eth0 unused > eth1 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) > No configuration found for eth1 unused > [ 289.859076] rcun ffffffff82440c00 completed wait for work. > [ 289.859428] rcun ffffffff82440c00 initiating boost. > [ 289.859641] rcun ffffffff82440c00 completed boost. > [ 289.873915] rcun ffffffff82440c00 awaking rcuc31. > [ 289.874127] rcun ffffffff82440c00 awakened rcuc31. > [ 289.893814] rcun ffffffff82440c00 starting wait for work. > eth10 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth10 unused > [ 290.106822] rcun ffffffff82440b00 completed wait for work. > [ 290.107203] rcun ffffffff82440b00 initiating boost. > [ 290.107436] rcun ffffffff82440b00 completed boost. > [ 290.123462] rcun ffffffff82440b00 awaking rcuc11. > [ 290.123677] rcun ffffffff82440b00 awakened rcuc11. > [ 290.143342] rcun ffffffff82440b00 starting wait for work. > eth11 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth11 unused > [ 290.350351] rcun ffffffff82441100 completed wait for work. > [ 290.350744] rcun ffffffff82441100 initiating boost. > [ 290.351016] rcun ffffffff82441100 completed boost. > [ 290.363076] rcun ffffffff82441100 awaking rcuc111. > [ 290.363345] rcun ffffffff82441100 awakened rcuc111. > [ 290.382943] rcun ffffffff82441100 starting wait for work. > eth12 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) > No configuration found for eth12 unused > [ 290.610137] rcun ffffffff82440b00 completed wait for work. > [ 290.610487] rcun ffffffff82440b00 initiating boost. > [ 290.610719] rcun ffffffff82440b00 completed boost. > [ 290.622562] rcun ffffffff82440b00 awaking rcuc11. > [ 290.622774] rcun ffffffff82440b00 awakened rcuc11. > [ 290.642453] rcun ffffffff82440b00 starting wait for work. > [ 290.642720] rcun ffffffff82440b00 completed wait for work. > [ 290.662423] rcun ffffffff82440b00 initiating boost. > [ 290.662643] rcun ffffffff82440b00 completed boost. > eth13 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) > [ 290.682708] rcun ffffffff82440b00 awaking rcuc11. > [ 290.702367] rcun ffffffff82440b00 awakened rcuc11. > No configuration f[ 290.702694] rcun ffffffff82440b00 starting wait for work. > ound for eth13 unused > [ 290.865245] rcun ffffffff82440e00 completed wait for work. > [ 290.865586] rcun ffffffff82440e00 initiating boost. > [ 290.865802] rcun ffffffff82440e00 completed boost. > [ 290.882235] rcun ffffffff82440e00 awaking rcuc51. > [ 290.882530] rcun ffffffff82440e00 awakened rcuc51. > [ 290.902004] rcun ffffffff82440e00 starting wait for work. > eth14 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth14 unused > [ 291.128769] rcun ffffffff82440c00 completed wait for work. > [ 291.129110] rcun ffffffff82440c00 initiating boost. > [ 291.129333] rcun ffffffff82440c00 completed boost. > [ 291.141675] rcun ffffffff82440c00 awaking rcuc31. > [ 291.141904] rcun ffffffff82440c00 awakened rcuc31. > [ 291.161527] rcun ffffffff82440c00 starting wait for work. > eth15 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth15 unused > [ 291.380340] rcun ffffffff82440c00 completed wait for work. > [ 291.380664] rcun ffffffff82440c00 initiating boost. > [ 291.380883] rcun ffffffff82440c00 completed boost. > [ 291.391214] rcun ffffffff82440c00 awaking rcuc21. > [ 291.391431] rcun ffffffff82440c00 awakened rcuc21. > [ 291.411105] rcun ffffffff82440c00 starting wait for work. > [ 291.432271] rcun ffffffff82440c00 completed wait for work. > [ 291.432551] rcun ffffffff82440c00 initiating boost. > [ 291.432785] rcun ffffffff82440c00 completed boost. > [ 291.451161] rcun ffffffff82440c00 awaking rcuc21. > [ 291.451167] rcun ffffffff82440c00 awakened rcuc21. > [ 291.451170] rcun ffffffff82440c00 starting wait for work. > eth16 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth16 unused > [ 291.652073] rcun ffffffff82440b00 completed wait for work. > [ 291.652448] rcun ffffffff82440b00 initiating boost. > [ 291.652741] rcun ffffffff82440b00 completed boost. > [ 291.670699] rcun ffffffff82440b00 awaking rcuc1. > [ 291.670945] rcun ffffffff82440b00 awakened rcuc1. > [ 291.690597] rcun ffffffff82440b00 starting wait for work. > eth17 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth17 unused > [ 291.931570] rcun ffffffff82441000 starting wait for work. > [ 291.931899] rcun ffffffff82441000 completed wait for work. > [ 291.932142] rcun ffffffff82441000 initiating boost. > [ 291.950219] rcun ffffffff82441000 completed boost. > [ 291.950445] rcun ffffffff82441000 awaking rcuc90. > [ 291.970127] rcun ffffffff82441000 awakened rcuc90. > [ 291.970344] rcun ffffffff82441000 starting wait for work. > [ 291.990092] rcun ffffffff82441000 completed wait for work. > [ 291.990101] rcun ffffffff82441000 initiating boost. > [ 291.990105] rcun ffffffff82441000 completed boost. > [ 291.990109] rcun ffffffff82441000 awaking rcuc90. > [ 291.990119] rcun ffffffff82441000 awakened rcuc90. > [ 291.990123] rcun ffffffff82441000 starting wait for work. > eth18 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) > No configuration found for eth18 unused > eth19 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) > No configuration found for eth19 unused > [ 292.470547] rcun ffffffff82440e00 completed wait for work. > [ 292.470936] rcun ffffffff82440e00 initiating boost. > [ 292.471191] rcun ffffffff82440e00 completed boost. > [ 292.489291] rcun ffffffff82440e00 awaking rcuc51. > [ 292.489550] rcun ffffffff82440e00 awakened rcuc51. > [ 292.509226] rcun ffffffff82440e00 starting wait for work. > [ 292.509478] rcun ffffffff82440e00 completed wait for work. > [ 292.529136] rcun ffffffff82440e00 initiating boost. > [ 292.529141] rcun ffffffff82440e00 completed boost. > [ 292.529144] rcun ffffffff82440e00 awaking rcuc51. > [ 292.529151] rcun ffffffff82440e00 awakened rcuc51. > [ 292.529155] rcun ffffffff82440e00 starting wait for work. > eth2 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) > No configuration found for eth2 unused > [ 292.753993] rcun ffffffff82440c00 completed wait for work. > [ 292.754348] rcun ffffffff82440c00 initiating boost. > [ 292.754596] rcun ffffffff82440c00 completed boost. > [ 292.768779] rcun ffffffff82440c00 awaking rcuc31. > [ 292.769018] rcun ffffffff82440c00 awakened rcuc31. > [ 292.788632] rcun ffffffff82440c00 starting wait for work. > eth20 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth20 unused > [ 292.985585] rcun ffffffff82440c00 completed wait for work. > [ 292.985925] rcun ffffffff82440c00 initiating boost. > [ 292.986150] rcun ffffffff82440c00 completed boost. > [ 292.998422] rcun ffffffff82440c00 awaking rcuc31. > [ 292.998647] rcun ffffffff82440c00 awakened rcuc31. > [ 293.018318] rcun ffffffff82440c00 starting wait for work. > eth21 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth21 unused > [ 293.289596] rcun ffffffff82440b00 completed wait for work. > [ 293.289987] rcun ffffffff82440b00 initiating boost. > [ 293.290245] rcun ffffffff82440b00 completed boost. > [ 293.307893] rcun ffffffff82440b00 awaking rcuc1. > [ 293.308159] rcun ffffffff82440b00 awakened rcuc1. > [ 293.327728] rcun ffffffff82440b00 starting wait for work. > [ 293.337111] rcun ffffffff82440b00 completed wait for work. > [ 293.347737] rcun ffffffff82440b00 initiating boost. > [ 293.347958] rcun ffffffff82440b00 completed boost. > eth22 de[ 293.367776] rcun ffffffff82440b00 awaking rcuc1. > vice: Intel Corporation 82599EB [ 293.368186] rcun ffffffff82440b00 awakened rcuc1. > 10 Gigabit Netwo[ 293.368191] rcun ffffffff82440b00 starting wait for work. > rk Connection (rev 01) > No configuration found for eth22 unused > eth23 device: Intel Corporation 82599EB 10 Gigabit Network Connection (rev 01) > No configuration found for eth23 unused > eth24 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth24 unused > [ 294.075710] rcun ffffffff82440e00 completed wait for work. > [ 294.076063] rcun ffffffff82440e00 initiating boost. > [ 294.076287] rcun ffffffff82440e00 completed boost. > [ 294.086523] rcun ffffffff82440e00 awaking rcuc51. > [ 294.086800] rcun ffffffff82440e00 awakened rcuc51. > [ 294.106385] rcun ffffffff82440e00 starting wait for work. > [ 294.135592] rcun ffffffff82440e00 completed wait for work. > [ 294.135876] rcun ffffffff82440e00 initiating boost. > [ 294.136103] rcun ffffffff82440e00 completed boost. > [ 294.146373] rcun ffffffff82440e00 awaking rcuc51. > [ 294.146643] rcun ffffffff82440e00 awakened rcuc51. > [ 294.166295] rcun ffffffff82440e00 starting wait for work. > eth25 device: Intel Corporation 82599EB 10-Gigabit KX4 Network Connection (rev 01) > No configuration found for eth25 unused > [ 294.383392] rcun ffffffff82440e00 completed wait for work. > [ 294.383728] rcun ffffffff82440e00 initiating boost. > [ 294.383945] rcun ffffffff82440e00 completed boost. > [ 294.395907] rcun ffffffff82440e00 awaking rcuc51. > [ 294.396134] rcun ffffffff82440e00 awakened rcuc51. > [ 294.415794] rcun ffffffff82440e00 starting wait for work. > eth26 device: QLogic Corp. 10GbE Converged Network Adapter (TCP/IP Networking) (rev 02) > No configuration found for eth26 unused > [ 294.658799] rcun ffffffff82440c00 completed wait for work. > [ 294.659130] rcun ffffffff82440c00 initiating boost. > [ 294.659341] rcun ffffffff82440c00 completed boost. > [ 294.675419] rcun ffffffff82440c00 awaking rcuc31. > [ 294.675645] rcun ffffffff82440c00 awakened rcuc31. > [ 294.695468] rcun ffffffff82440c00 starting wait for work. > eth27 device: QLogic Corp. 10GbE Converged Network Adapter (TCP/IP Networking) (rev 02) > No configuration found for eth27 unused > [ 294.918239] rcun ffffffff82440b00 completed wait for work. > [ 294.918602] rcun ffffffff82440b00 initiating boost. > [ 294.918834] rcun ffffffff82440b00 completed boost. > [ 294.934983] rcun ffffffff82440b00 awaking rcuc0. > [ 294.935245] rcun ffffffff82440b00 awakened rcuc0. > [ 294.954854] rcun ffffffff82440b00 starting wait for work. > eth3 device: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) > No configuration found for eth3 unused > [ 295.233919] rcun ffffffff82440d00 starting wait for work. > [ 295.234268] rcun ffffffff82440d00 completed wait for work. > [ 295.234550] rcun ffffffff82440d00 initiating boost. > [ 295.244455] rcun ffffffff82440d00 completed boost. > [ 295.244686] rcun ffffffff82440d00 awaking rcuc40. > eth4 de[ 295.264423] rcun ffffffff82440d00 awakened rcuc40. > vice: Intel Corporation 82576 Gi[ 295.264682] rcun ffffffff82440d00 starting wait for work. > gabit Network Connection (rev 01) > No configuration found for eth4 unused > [ 295.477285] rcun ffffffff82440f00 completed wait for work. > [ 295.477635] rcun ffffffff82440f00 initiating boost. > [ 295.477861] rcun ffffffff82440f00 completed boost. > [ 295.494030] rcun ffffffff82440f00 awaking rcuc70. > [ 295.494278] rcun ffffffff82440f00 awakened rcuc70. > [ 295.513989] rcun ffffffff82440f00 starting wait for work. > eth5 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth5 unused > [ 295.720833] rcun ffffffff82440d00 completed wait for work. > [ 295.721156] rcun ffffffff82440d00 initiating boost. > [ 295.721376] rcun ffffffff82440d00 completed boost. > [ 295.733558] rcun ffffffff82440d00 awaking rcuc41. > [ 295.733799] rcun ffffffff82440d00 awakened rcuc41. > [ 295.753424] rcun ffffffff82440d00 starting wait for work. > eth6 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth6 unused > [ 296.000356] rcun ffffffff82440c00 completed wait for work. > [ 296.000716] rcun ffffffff82440c00 initiating boost. > [ 296.000942] rcun ffffffff82440c00 completed boost. > [ 296.013086] rcun ffffffff82440c00 awaking rcuc31. > [ 296.013305] rcun ffffffff82440c00 awakened rcuc31. > [ 296.032935] rcun ffffffff82440c00 starting wait for work. > eth7 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth7 unused > [ 296.220235] rcun ffffffff82440e00 completed wait for work. > [ 296.220570] rcun ffffffff82440e00 initiating boost. > [ 296.220796] rcun ffffffff82440e00 completed boost. > [ 296.232694] rcun ffffffff82440e00 awaking rcuc51. > [ 296.232934] rcun ffffffff82440e00 awakened rcuc51. > [ 296.252558] rcun ffffffff82440e00 starting wait for work. > [ 296.252841] rcun ffffffff82440e00 completed wait for work. > [ 296.272515] rcun ffffffff82440e00 initiating boost. > [ 296.272784] rcun ffffffff82440e00 completed boost. > [ 296.292387] rcun ffffffff82440e00 awaking rcuc51. > [ 296.292647] rcun ffffffff82440e00 awakened rcuc51. > [ 296.292920] rcun ffffffff82440e00 starting wait for work. > eth8 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth8 unused > [ 296.483877] rcun ffffffff82440b00 completed wait for work. > [ 296.484237] rcun ffffffff82440b00 initiating boost. > [ 296.484456] rcun ffffffff82440b00 completed boost. > [ 296.502204] rcun ffffffff82440b00 awaking rcuc1. > [ 296.502491] rcun ffffffff82440b00 awakened rcuc1. > [ 296.522201] rcun ffffffff82440b00 starting wait for work. > [ 296.522459] rcun ffffffff82440b00 completed wait for work. > [ 296.542024] rcun ffffffff82440b00 initiating boost. > [ 296.542317] rcun ffffffff82440b00 completed boost. > [ 296.561935] rcun ffffffff82440b00 awaking rcuc1. > [ 296.562213] rcun ffffffff82440b00 awakened rcuc1. > [ 296.562431] rcun ffffffff82440b00 starting wait for work. > eth9 device: Intel Corporation 82576 Gigabit Network Connection (rev 01) > No configuration found for eth9 unused > ib0 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > No configuration found for ib0 unused > [ 297.014805] rcun ffffffff82440b00 completed wait for work. > [ 297.015156] rcun ffffffff82440b00 initiating boost. > [ 297.015406] rcun ffffffff82440b00 completed boost. > [ 297.031238] rcun ffffffff82440b00 awaking rcuc1. > [ 297.031490] rcun ffffffff82440b00 awakened rcuc1. > [ 297.051130] rcun ffffffff82440b00 starting wait for work. > ib1 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > No configuration found for ib1 unused > Setting up service (localfs) network . . . . . . . . . . done > Starting RPC portmap daemon done > Setting up (remotefs) network interfaces: > Setting up service (remotefs) network . . . . . . . . . . done > Master Resource Control: runlevel 3 has been reached > [ 298.843747] rcun ffffffff82440e00 completed wait for work. > [ 298.844125] rcun ffffffff82440e00 initiating boost. > [ 298.844380] rcun ffffffff82440e00 completed boost. > [ 298.858327] rcun ffffffff82440e00 awaking rcuc52. > [ 298.858648] rcun ffffffff82440e00 awakened rcuc52. > [ 298.878014] rcun ffffffff82440e00 starting wait for work. > [ 298.887761] rcun ffffffff82440c00 completed wait for work. > [ 298.898179] rcun ffffffff82440c00 initiating boost. > [ 298.898480] rcun ffffffff82440c00 completed boost. > [ 298.917825] rcun ffffffff82440c00 awaking rcuc30. > [ 298.918148] rcun ffffffff82440c00 awakened rcuc30. > [ 298.918399] rcun ffffffff82440c00 starting wait for work. > [ 299.051210] rcun ffffffff82441300 completed wait for work. > [ 299.051271] rcun ffffffff82440d00 completed wait for work. > [ 299.051279] rcun ffffffff82440d00 initiating boost. > [ 299.051283] rcun ffffffff82440d00 completed boost. > [ 299.051288] rcun ffffffff82440d00 awaking rcuc40. > [ 299.051300] rcun ffffffff82440c00 completed wait for work. > [ 299.051314] rcun ffffffff82440c00 initiating boost. > [ 299.051323] rcun ffffffff82440c00 completed boost. > [ 299.051328] rcun ffffffff82440c00 awaking rcuc31. > [ 299.051356] rcun ffffffff82440d00 awakened rcuc40. > [ 299.051360] rcun ffffffff82440d00 starting wait for work. > [ 299.051383] rcun ffffffff82440c00 awakened rcuc31. > [ 299.051388] rcun ffffffff82440c00 starting wait for work. > [ 299.148138] rcun ffffffff82441300 initiating boost. > [ 299.167706] rcun ffffffff82441300 completed boost. > [ 299.167993] rcun ffffffff82441300 awaking rcuc132. > [ 299.187516] rcun ffffffff82441300 awakened rcuc132. > [ 299.187841] rcun ffffffff82441300 starting wait for work. > [ 299.223164] rcun ffffffff82441300 completed wait for work. > [ 299.223485] rcun ffffffff82441300 initiating boost. > [ 299.227367] rcun ffffffff82441300 completed boost. > [ 299.227654] rcun ffffffff82441300 awaking rcuc132. > [ 299.227969] rcun ffffffff82441300 awakened rcuc132. > [ 299.247460] rcun ffffffff82441300 starting wait for work. > [ 299.278920] rcun ffffffff82440d00 completed wait for work. > [ 299.279251] rcun ffffffff82440d00 initiating boost. > [ 299.279497] rcun ffffffff82440d00 completed boost. > [ 299.297867] rcun ffffffff82440d00 awaking rcuc40. > [ 299.298180] rcun ffffffff82440d00 awakened rcuc40. > [ 299.317166] rcun ffffffff82440d00 starting wait for work. > [ 299.334809] rcun ffffffff82441300 completed wait for work. > [ 299.336979] rcun ffffffff82441300 initiating boost. > [ 299.337250] rcun ffffffff82441300 completed boost. > [ 299.338720] rcun ffffffff82441100 completed wait for work. > [ 299.338737] rcun ffffffff82441100 initiating boost. > [ 299.338745] rcun ffffffff82441100 completed boost. > [ 299.338752] rcun ffffffff82441100 awaking rcuc102. > [ 299.338803] rcun ffffffff82441100 awakened rcuc102. > [ 299.338809] rcun ffffffff82441100 starting wait for work. > [ 299.342935] rcun ffffffff82440f00 completed wait for work. > [ 299.342964] rcun ffffffff82440f00 initiating boost. > [ 299.342991] rcun ffffffff82440f00 completed boost. > [ 299.343009] rcun ffffffff82440f00 awaking rcuc70. > [ 299.343064] rcun ffffffff82440f00 awakened rcuc70. > [ 299.343077] rcun ffffffff82440f00 starting wait for work. > [ 299.350627] rcun ffffffff82441100 completed wait for work. > [ 299.350634] rcun ffffffff82441100 initiating boost. > [ 299.350638] rcun ffffffff82441100 completed boost. > [ 299.350641] rcun ffffffff82441100 awaking rcuc102. > [ 299.350652] rcun ffffffff82441100 awakened rcuc102. > [ 299.350655] rcun ffffffff82441100 starting wait for work. > [ 299.350689] rcun ffffffff82440f00 completed wait for work. > [ 299.350701] rcun ffffffff82440f00 initiating boost. > [ 299.350708] rcun ffffffff82440f00 completed boost. > [ 299.350716] rcun ffffffff82440f00 awaking rcuc70. > [ 299.350807] rcun ffffffff82440f00 awakened rcuc70. > [ 299.350815] rcun ffffffff82440f00 starting wait for work. > [ 299.557141] rcun ffffffff82441300 awaking rcuc132. > [ 299.557143] rcun ffffffff82441300 awakened rcuc132. > [ 299.557145] rcun ffffffff82441300 starting wait for work. > > lb-g5plus-1t-host login: > lb-g5plus-1t-host login: [ 379.823934] INFO: task rcun0:8 blocked for more than 120 seconds. > [ 379.824295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 379.843811] rcun0 D 0000000000000000 0 8 2 0x00000000 > [ 379.844152] ffff882070d47e90 0000000000000046 ffff882070d46000 0000000000004000 > [ 379.844178] 00000000001d1f40 ffff882070d47fd8 00000000001d1f40 ffff882070d47fd8 > [ 379.844204] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff882070d4a2b0 > [ 379.844232] Call Trace: > [ 379.844255] [<ffffffff810afa0a>] ? __lock_release+0x166/0x16f > [ 379.844273] [<ffffffff81c21de9>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > [ 379.844287] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 > [ 379.844298] [<ffffffff810adcde>] ? trace_hardirqs_on+0xd/0xf > [ 379.844309] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 > [ 379.844325] [<ffffffff81099e3d>] kthread+0x8c/0xa8 > [ 379.844342] [<ffffffff81c29ad4>] kernel_thread_helper+0x4/0x10 > [ 379.844353] [<ffffffff81c22080>] ? retint_restore_args+0xe/0xe > [ 379.844364] [<ffffffff81099db1>] ? __init_kthread_worker+0x5b/0x5b > [ 379.844375] [<ffffffff81c29ad0>] ? gs_change+0xb/0xb > [ 379.844379] INFO: lockdep is turned off. OK, so rcun0 covers CPUs 0-15. Looking above, we find that this means "rcun ffffffff82440b00". And the last thing done by this rcun kthread was "starting wait for work", which would be rcu_wait(), which is as follows: #define rcu_wait(cond) \ do { \ for (;;) { \ set_current_state(TASK_INTERRUPTIBLE); \ if (cond) \ break; \ schedule(); \ } \ __set_current_state(TASK_RUNNING); \ } while (0) But this has the kthread waiting with TASK_INTERRUPTIBLE, which should prevent the "blocked for more than 120 seconds" diagnostic. > [ 379.844595] INFO: task rcun8:576 blocked for more than 120 seconds. > [ 379.844598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 379.844602] rcun8 D 0000000000000000 0 576 2 0x00000000 > [ 379.844608] ffff881fff98de90 0000000000000046 ffff881fff98c000 0000000000004000 > [ 379.844634] 00000000001d1f40 ffff881fff98dfd8 00000000001d1f40 ffff881fff98dfd8 > [ 379.844658] 0000000000004000 00000000001d1f40 ffff882070d18000 ffff881fff9822b0 > [ 379.844683] Call Trace: > [ 379.844694] [<ffffffff810afa0a>] ? __lock_release+0x166/0x16f > [ 379.844705] [<ffffffff81c21de9>] ? _raw_spin_unlock_irqrestore+0x3f/0x46 > [ 379.844715] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 > [ 379.844726] [<ffffffff810adcde>] ? trace_hardirqs_on+0xd/0xf > [ 379.844736] [<ffffffff810cf297>] ? rcu_cpu_kthread_timer+0x44/0x44 > [ 379.844747] [<ffffffff81099e3d>] kthread+0x8c/0xa8 > [ 379.844759] [<ffffffff81c29ad4>] kernel_thread_helper+0x4/0x10 > [ 379.844769] [<ffffffff81c22080>] ? retint_restore_args+0xe/0xe > [ 379.844781] [<ffffffff81099db1>] ? __init_kthread_worker+0x5b/0x5b > [ 379.844791] [<ffffffff81c29ad0>] ? gs_change+0xb/0xb > [ 379.844794] INFO: lockdep is turned off. Ditto here. rcun8 handles CPUs 128-143, which includes CPU 132. So we have "rcun ffffffff82441300", which also was last seen "starting wait for work". And therefore also was waiting TASK_INTERRUPTIBLE. If I cannot think of anything better, I will send you a patch to the "blocked for more than 120 seconds" code that dumps out the state of the rcun kthreads. Thanx, Paul ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2011-05-28 6:38 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <tip-80d02085d99039b3b7f3a73c8896226b0cb1ba07@git.kernel.org> 2011-05-20 21:04 ` [tip:core/rcu] Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" Yinghai Lu 2011-05-20 22:42 ` Paul E. McKenney 2011-05-20 23:09 ` Yinghai Lu 2011-05-20 23:14 ` Paul E. McKenney 2011-05-20 23:16 ` Yinghai Lu 2011-05-20 23:49 ` Paul E. McKenney 2011-05-21 0:02 ` Yinghai Lu 2011-05-21 13:18 ` Paul E. McKenney 2011-05-21 14:08 ` Paul E. McKenney 2011-05-23 20:14 ` Yinghai Lu 2011-05-23 21:25 ` Paul E. McKenney 2011-05-23 22:01 ` Yinghai Lu 2011-05-23 22:55 ` Yinghai Lu 2011-05-23 22:58 ` Yinghai Lu 2011-05-24 1:18 ` Paul E. McKenney 2011-05-24 1:26 ` Yinghai Lu 2011-05-24 1:35 ` Paul E. McKenney 2011-05-24 21:23 ` Yinghai Lu 2011-05-25 0:05 ` Paul E. McKenney 2011-05-25 0:13 ` Yinghai Lu 2011-05-25 4:46 ` Paul E. McKenney 2011-05-25 7:24 ` Ingo Molnar 2011-05-25 20:48 ` Paul E. McKenney 2011-05-25 7:18 ` Ingo Molnar 2011-05-25 0:16 ` Paul E. McKenney 2011-05-25 0:10 ` Yinghai Lu 2011-05-25 4:52 ` Paul E. McKenney 2011-05-25 7:27 ` Ingo Molnar 2011-05-25 20:47 ` Paul E. McKenney 2011-05-25 20:52 ` Ingo Molnar 2011-05-25 22:15 ` Yinghai Lu 2011-05-25 22:34 ` Paul E. McKenney 2011-05-25 22:49 ` Yinghai Lu 2011-05-26 1:13 ` Paul E. McKenney 2011-05-26 1:30 ` Paul E. McKenney 2011-05-26 6:13 ` Ingo Molnar 2011-05-26 14:25 ` Paul E. McKenney 2011-05-26 17:43 ` Paul E. McKenney 2011-05-26 20:26 ` Ingo Molnar 2011-05-26 15:08 ` Yinghai Lu 2011-05-26 16:28 ` Paul E. McKenney 2011-05-28 1:04 ` Paul E. McKenney 2011-05-28 4:03 ` Yinghai Lu 2011-05-28 6:38 ` Paul E. McKenney 2011-05-24 1:12 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.