linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Rich Felker <dalias@libc.org>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, tglx@linutronix.de
Subject: Re: rcu_sched stalls in idle task introduced in pre-4.8?
Date: Mon, 12 Sep 2016 08:36:04 -0700	[thread overview]
Message-ID: <20160912153604.GN3851@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160912151345.GW15995@brightrain.aerifal.cx>

On Mon, Sep 12, 2016 at 11:13:45AM -0400, Rich Felker wrote:
> On Sat, Sep 10, 2016 at 03:19:38AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 08, 2016 at 06:16:53PM -0400, Rich Felker wrote:
> > > On Wed, Aug 03, 2016 at 09:16:31AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 02, 2016 at 01:45:04PM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 02, 2016 at 04:32:17PM -0400, Rich Felker wrote:
> > > > > > On Tue, Aug 02, 2016 at 12:48:02PM -0700, Paul E. McKenney wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > > Does the problem reproduces easily?
> > > > > > 
> > > > > > Yes, it happens right after boot and repeats every 30-90 seconds or
> > > > > > so.
> > > > > 
> > > > > Well, that at least makes it easier to test any patches!
> > > > > 
> > > > > > > A bisection might be very helpful.
> > > > > > 
> > > > > > Bisection would require some manual work to setup because the whole
> > > > > > reason I was rebasing on Linus's tree was to adapt the drivers to
> > > > > > upstream infrastructure changes (the new cpuhp stuff replacing
> > > > > > notifier for cpu starting). The unfortunate way it was done, each
> > > > > > driver adds an enum to linux/cpuhotplug.h so all the patches have
> > > > > > gratuitous conflicts. In addition, for older revisions in Linus's
> > > > > > tree, there's at least one show-stopping (hang during boot) bug that
> > > > > > needs a cherry-pick to fix. There may be other small issues too. I
> > > > > > don't think they're at all insurmountible but it requires an annoying
> > > > > > amount of scripting.
> > > > > 
> > > > > I had to ask!  Might eventually be necessary, but let's see what we
> > > > > can learn from what you currently have.
> > > > 
> > > > And at first glance, my overnight run looks uglier than I would expect.
> > > > I am now running tests at v4.7, and will run other tests to see if
> > > > there really is a statistically significant degradation.  If there is,
> > > > then I might be able to bisect, though with nine-hour runs this could
> > > > take quite some time.
> > > 
> > > Any more thoughts on this? I'm testing v4.8-rc5 (plus jcore drivers
> > > not yet upstream) and it's still happening.
> > 
> > Not seeing it, but please do send me a recent splat from your dmesg and
> > your .config.
> > 
> > Because I am not seeing it, I also suggest inspecting your jcore drivers
> > with the information in Documentation/RCU/stallwarn.txt in mind.
> 
> I found a plausible cause -- it looks like the kernel timer system is
> programming the clock event device with intervals so low that another
> timer interrupt is pending before the first handler finishes, and it
> immediately interrupts the softirq part of the timer. Here's a snippet
> from ftrace event tracing:

That sort of thing could definitely result in RCU CPU stall warnings,
and much else besides!  Not so good for battery lifetime, for example,
if that is an issue in your case.

>           <idle>-0     [001] d.h.  2646.702790: irq_handler_entry: irq=72 name=jcore_pit
>           <idle>-0     [001] d.h.  2646.703398: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h.  2646.703607: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h.  2646.703884: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] d.h.  2646.704191: irq_handler_exit: irq=72 ret=handled
>           <idle>-0     [001] d.H.  2646.704509: irq_handler_entry: irq=72 name=jcore_pit
>           <idle>-0     [001] d.H.  2646.704990: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.H.  2646.705182: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.H.  2646.705465: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] d.H.  2646.705761: irq_handler_exit: irq=72 ret=handled
>           <idle>-0     [001] d.H.  2646.706071: irq_handler_entry: irq=72 name=jcore_pit
>           <idle>-0     [001] d.H.  2646.706328: irq_handler_exit: irq=72 ret=handled
>           <idle>-0     [001] ..s.  2646.706530: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s.  2646.706708: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s.  2646.706816: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s.  2646.707270: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s.  2646.707397: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s.  2646.707641: softirq_exit: vec=9 [action=RCU]
>      ksoftirqd/1-14    [001] ..s.  2646.708569: softirq_entry: vec=1 [action=TIMER]
>      ksoftirqd/1-14    [001] ..s.  2646.708727: softirq_exit: vec=1 [action=TIMER]
>      ksoftirqd/1-14    [001] ..s.  2646.708829: softirq_entry: vec=7 [action=SCHED]
>      ksoftirqd/1-14    [001] ..s.  2646.709073: softirq_exit: vec=7 [action=SCHED]
>      ksoftirqd/1-14    [001] ..s.  2646.709180: softirq_entry: vec=9 [action=RCU]
>      ksoftirqd/1-14    [001] ..s.  2646.709882: softirq_exit: vec=9 [action=RCU]
> 
> Presumably it usually just hits one or two levels of nesting like
> above, but in the worst case it gets in some sort of feedback loop and
> the backlog grows until it lasts >21 sec. I was able to see that there
> are plenty of shorter stalls (too short for the stall detector to kick
> in) by removing the SD card and observing a multi-second delay before:
> 
> 	[  401.375518] mmc0: SPI card removed
> 
> The spi mmc host code that does this probe (no irq) is scheduled to
> run every 1s, so if the message takes >1s to appear (>0.5s avg),
> timers must be stalled.

Makes sense!

> Other interrupt activity (like serial or ethernet) breaks the stall
> immediately so it's not noticable in most interactive usage.

Interesting...  This other interrupt activity stops the timer from
posting so often, or just runs at a higher interrupt priority level?

> I was able to make it go away by increasing the min_delta for the
> clock even device to 2.5ms (chosen because I measured timer interrupts
> taking up to ~1.5ms -- keep in mind this is at 50 MHz) but hard-coding
> a number like that does not seem like a proper solution. The kernel
> timer subsystem should have some way to choose oneshot expiration
> appropriately based on feedback, or it should otherwise ensure that
> the feedback loop can't happen (maybe by waiting to program the next
> expiration from the softirq instead of the hard one, but I'm not sure
> if that's possible...?).

I am not sure what the timer guys will want to do, but it might be that
they program at the beginning to increase accuracy.  I could imagine
varying the default min_delta value based on the processor frequency,
but I have not idea whether or not that is a reasonable approach.

							Thanx, Paul

      reply	other threads:[~2016-09-12 15:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-02 17:04 rcu_sched stalls in idle task introduced in pre-4.8? Rich Felker
2016-08-02 18:16 ` Paul E. McKenney
2016-08-02 19:20   ` Rich Felker
2016-08-02 19:48     ` Paul E. McKenney
2016-08-02 20:32       ` Rich Felker
2016-08-02 20:45         ` Paul E. McKenney
2016-08-03 16:16           ` Paul E. McKenney
2016-09-08 22:16             ` Rich Felker
2016-09-10 10:19               ` Paul E. McKenney
2016-09-12 15:13                 ` Rich Felker
2016-09-12 15:36                   ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160912153604.GN3851@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=dalias@libc.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).