All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Amit Shah <amit.shah@redhat.com>
Cc: linux-kernel@vger.kernel.org, riel@redhat.com, mingo@kernel.org,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com,
	sbw@mit.edu
Subject: Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups
Date: Fri, 22 Aug 2014 14:57:20 -0700	[thread overview]
Message-ID: <20140822215720.GA21092@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140822215343.GH2663@linux.vnet.ibm.com>

On Fri, Aug 22, 2014 at 02:53:44PM -0700, Paul E. McKenney wrote:
> On Fri, Aug 22, 2014 at 10:44:05PM +0530, Amit Shah wrote:
> > On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote:
> > > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote:
> > > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote:
> > > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote:
> > > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote:
> > > > > > 
> > > > > > > The odds are low over the next few days.  I am adding nastier rcutorture
> > > > > > > testing, however.  It would still be very good to get debug information
> > > > > > > from your setup.  One approach would be to convert the trace function
> > > > > > > calls into printk(), if that would help.
> > > > > > 
> > > > > > I added a few printks on the lines of the traces in cases where
> > > > > > rcu_nocb_poll was checked -- since that reproduces the hang.  Are the
> > > > > > following traces sufficient, or should I keep adding more printks?
> > > > > > 
> > > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while
> > > > > > (when the guest locks up hard).  That's when I kill the qemu process.
> > > > > 
> > > > > And this is bt from gdb when the endless 
> > > > > 
> > > > >   RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot
> > > > > 
> > > > > messages are being spewed.
> > > > > 
> > > > > I can't time it, but hope it gives some indication along with the printks.
> > > > 
> > > > ... and after the system 'locks up', this is the state it's in:
> > > > 
> > > > ^C
> > > > Program received signal SIGINT, Interrupt.
> > > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > 50		 }
> > > > (gdb) bt
> > > > #0  native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50
> > > > #1  0xffffffff8100b9c1 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:111
> > > > #2  default_idle () at arch/x86/kernel/process.c:311
> > > > #3  0xffffffff8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302
> > > > #4  0xffffffff8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120
> > > > #5  cpu_idle_loop () at kernel/sched/idle.c:220
> > > > #6  cpu_startup_entry (state=<optimized out>) at kernel/sched/idle.c:268
> > > > #7  0xffffffff813e068b in rest_init () at init/main.c:418
> > > > #8  0xffffffff81a8cf5a in start_kernel () at init/main.c:680
> > > > #9  0xffffffff81a8c4ba in x86_64_start_reservations (real_mode_data=<optimized out>) at arch/x86/kernel/head64.c:193
> > > > #10 0xffffffff81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 <cpu_lock_stats+29184> <error: Cannot access memory at address 0x13f90>)
> > > >     at arch/x86/kernel/head64.c:182
> > > > #11 0x0000000000000000 in ?? ()
> > > > 
> > > > 
> > > > Wondering why it's doing this.  Am stepping through
> > > > cpu_startup_entry() to see if I get any clues.
> > > 
> > > This looks to me like normal behavior in the x86 ACPI idle loop.
> > > My guess is that the lockup is caused by indefinite blocking, in
> > > which case we would expect all the CPUs to be in the idle loop.
> > 
> > Hm, found it:
> > 
> > The stall happens in do_initcalls().
> > 
> > pm_sysrq_init() is the function that causes the hang.  When I #if 0
> > the line
> > 
> >     register_sysrq_key('o', &sysrq_poweroff_op);
> > 
> > in pm_sysrq_init(), the boot proceeds normally.
> 
> Yow!!!
> 
> > Now what this is, and what relation this has to rcu and that patch in
> > particular is next...
> 
> Hmmm...  Please try replacing the synchronize_rcu() in
> __sysrq_swap_key_ops() with (say) schedule_timeout_interruptible(HZ / 10).
> I bet that gets rid of the hang.  (And also introduces a low-probability
> bug, but should be OK for testing.)
> 
> The other thing to try is to revert your patch that turned my event
> traces into printk()s, then put an ftrace_dump(DUMP_ALL); just after
> the synchronize_rcu() -- that might make it so that the ftrace data
> actually gets dumped out.

And one other thing to try...

Put a printk at the beginning of rcu_spawn_gp_kthread(), which is in
kernel/rcu/tree.c.  If that printk does not appear before the call
to pm_sysrq_init(), that would be an important clue.

							Thanx, Paul


  reply	other threads:[~2014-08-22 21:57 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 22:48 [PATCH tip/core/rcu 0/2] Callback-offloading changes for 3.17 Paul E. McKenney
2014-07-11 13:35 ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Paul E. McKenney
2014-07-11 13:35   ` [PATCH tip/core/rcu 2/2] rcu: Don't offload callbacks unless specifically requested Paul E. McKenney
2014-07-11 13:47     ` Frederic Weisbecker
2014-07-11 15:28       ` Paul E. McKenney
2014-08-08  8:40   ` [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups Amit Shah
2014-08-08 16:25     ` Paul E. McKenney
2014-08-08 17:37       ` Amit Shah
2014-08-08 18:18         ` Paul E. McKenney
2014-08-08 18:34           ` Amit Shah
2014-08-08 21:43             ` Paul E. McKenney
2014-08-08 21:46               ` Paul E. McKenney
2014-08-11  7:13                 ` Amit Shah
2014-08-11 16:28                   ` Paul E. McKenney
2014-08-11 19:41                     ` Amit Shah
2014-08-11 20:11                       ` Paul E. McKenney
2014-08-11 20:18                         ` Amit Shah
2014-08-11 20:34                           ` Paul E. McKenney
2014-08-12  3:45                             ` Paul E. McKenney
2014-08-12  5:33                               ` Amit Shah
2014-08-12 16:06                                 ` Paul E. McKenney
2014-08-12 21:39                                   ` Paul E. McKenney
2014-08-12 21:41                                     ` Paul E. McKenney
2014-08-12 21:44                                       ` Paul E. McKenney
2014-08-13  5:44                                       ` Amit Shah
2014-08-13 13:00                                         ` Paul E. McKenney
2014-08-13 14:18                                           ` Paul E. McKenney
2014-08-15  5:24                                           ` Amit Shah
2014-08-15 15:04                                             ` Paul E. McKenney
2014-08-18 17:53                                               ` Amit Shah
2014-08-19  4:01                                                 ` Paul E. McKenney
2014-08-22 12:24                                                   ` Amit Shah
2014-08-22 12:36                                                     ` Amit Shah
2014-08-22 12:56                                                       ` Amit Shah
2014-08-22 14:48                                                         ` Paul E. McKenney
2014-08-22 17:14                                                           ` Amit Shah
2014-08-22 17:37                                                             ` Amit Shah
2014-08-22 21:53                                                             ` Paul E. McKenney
2014-08-22 21:57                                                               ` Paul E. McKenney [this message]
2014-08-22 14:43                                                     ` Paul E. McKenney
2014-08-12  5:27                             ` Amit Shah
2014-08-12 16:08                               ` Paul E. McKenney
2014-08-23  7:43 Pranith Kumar
2014-08-23 16:51 ` Paul E. McKenney
2014-08-24  0:26   ` Pranith Kumar
2014-08-24  3:23     ` Paul E. McKenney
2014-08-24  3:39       ` Pranith Kumar
2014-08-24 14:36         ` Paul E. McKenney
2014-08-27  4:43 ` Amit Shah
2014-08-27 16:21   ` Paul E. McKenney
2014-08-27 16:43     ` Pranith Kumar
2014-08-27 17:08       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140822215720.GA21092@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.