linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.cz>, Jan Kara <jack@suse.cz>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Dave Anderson <anderson@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.cz>, Kay Sievers <kay@vrfy.org>
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context
Date: Wed, 18 Jun 2014 14:07:57 -0700	[thread overview]
Message-ID: <20140618210757.GU4669@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1406182232390.2303@pobox.suse.cz>

On Wed, Jun 18, 2014 at 10:36:10PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > OK, unconditional non-use of NMIs is even easier.  ;-)
> > 
> > Something like the following.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Don't use NMIs to dump other CPUs' stacks
> > 
> > Although NMI-based stack dumps are in principle more accurate, they are
> > also more likely to trigger deadlocks.  This commit therefore replaces
> > all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
> > that the CPU detecting an RCU CPU stall does the stack dumping.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c590e1201c74..777624e1329b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -932,10 +932,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
> >  }
> >  
> >  /*
> > - * Dump stacks of all tasks running on stalled CPUs.  This is a fallback
> > - * for architectures that do not implement trigger_all_cpu_backtrace().
> > - * The NMI-triggered stack traces are more accurate because they are
> > - * printed by the target CPU.
> > + * Dump stacks of all tasks running on stalled CPUs.
> >   */
> >  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
> >  {
> > @@ -1013,7 +1010,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> >  	       (long)rsp->gpnum, (long)rsp->completed, totqlen);
> >  	if (ndetected == 0)
> >  		pr_err("INFO: Stall ended before state dump start\n");
> > -	else if (!trigger_all_cpu_backtrace())
> > +	else
> >  		rcu_dump_cpu_stacks(rsp);
> >  
> >  	/* Complain about tasks blocking the grace period. */
> > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> >  	pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
> >  		jiffies - rsp->gp_start,
> >  		(long)rsp->gpnum, (long)rsp->completed, totqlen);
> > -	if (!trigger_all_cpu_backtrace())
> > -		dump_stack();
> > +	rcu_dump_cpu_stacks(rsp);
> 
> This is prone to producing not really consistent stacktraces though, 
> right? As the target task is still running at the time the stack is being 
> walked, it might produce stacktraces that are potentially nonsensial.

If a CPU is stuck, the stack trace down to where it is stuck is
likely to be static.  But yes, there is some potential for confusion.
My (admittedly limited) rcutorture testing produced sensible stack traces,
but things might be a bit uglier in other situations.

> How about sending NMI to the target CPU, so that the task is actually 
> stopped, but printing its stacktrace from the CPU that detected the stall 
> while it's stopped?
> 
> That way, there is no printk()-from-NMI, but also the stacktrace is 
> guaranteed to be self-consistent.

I believe that this was what Steven was suggesting, though by using
tracing.  Of course, if my current approach isn't up to the job,
then something like this general approach would look quite good.

							Thanx, Paul


  reply	other threads:[~2014-06-18 21:08 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-09  9:10 [RFC PATCH 00/11] printk: safe printing in NMI context Petr Mladek
2014-05-09  9:10 ` [RFC PATCH 01/11] printk: rename struct printk_log to printk_msg Petr Mladek
2014-05-09  9:10 ` [RFC PATCH 02/11] printk: allow to handle more log buffers Petr Mladek
2014-05-09  9:10 ` [RFC PATCH 03/11] printk: rename "logbuf_lock" to "main_logbuf_lock" Petr Mladek
2014-05-09  9:10 ` [RFC PATCH 04/11] printk: add NMI ring and cont buffers Petr Mladek
2014-05-09  9:10 ` [RFC PATCH 05/11] printk: allow to modify NMI log buffer size using boot parameter Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 06/11] printk: NMI safe printk Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 07/11] printk: right ordering of the cont buffers from NMI context Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 08/11] printk: try hard to print Oops message in " Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 09/11] printk: merge and flush NMI buffer predictably via IRQ work Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 10/11] printk: survive rotation of sequence numbers Petr Mladek
2014-05-09  9:11 ` [RFC PATCH 11/11] printk: avoid staling when merging NMI log buffer Petr Mladek
2014-05-28 22:02 ` [RFC PATCH 00/11] printk: safe printing in NMI context Jiri Kosina
2014-05-29  0:09   ` Frederic Weisbecker
2014-05-29  8:09     ` Jiri Kosina
2014-06-10 16:46       ` Frederic Weisbecker
2014-06-10 16:57         ` Linus Torvalds
2014-06-10 17:32           ` Jiri Kosina
2014-06-11  9:01             ` Petr Mládek
2014-06-18 11:03           ` Jiri Kosina
2014-06-18 14:36             ` Paul E. McKenney
2014-06-18 14:41               ` Jiri Kosina
2014-06-18 14:44                 ` Paul E. McKenney
2014-06-18 14:53                   ` Jiri Kosina
2014-06-18 15:07                     ` Paul E. McKenney
     [not found]               ` <CA+55aFwPgDC6gSEPfu3i-pA4f0ZbsTSvykxzX4sXMeLbdXuKrw@mail.gmail.com>
2014-06-18 16:21                 ` Paul E. McKenney
2014-06-18 16:38                   ` Steven Rostedt
2014-06-18 16:43                     ` Paul E. McKenney
2014-06-18 20:36                   ` Jiri Kosina
2014-06-18 21:07                     ` Paul E. McKenney [this message]
2014-06-18 21:12                       ` Jiri Kosina
2014-06-18 21:20                         ` Paul E. McKenney
2014-06-18 21:32                           ` Jiri Kosina
2014-06-18 21:37                             ` Paul E. McKenney
2014-06-18 23:20                         ` Steven Rostedt
2014-05-30  8:13     ` Jan Kara
2014-05-30 10:10       ` Jiri Kosina
2014-06-10 16:49       ` Frederic Weisbecker
2014-06-12 11:50     ` Petr Mládek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140618210757.GU4669@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anderson@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jack@suse.cz \
    --cc=jkosina@suse.cz \
    --cc=kay@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=pmladek@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).