linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Grygorii Strashko <grygorii.strashko@ti.com>
Subject: Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
Date: Thu, 27 Dec 2018 07:53:23 +0100	[thread overview]
Message-ID: <20181227065321.GA3749@lerouge> (raw)
In-Reply-To: <c472b6bf-e4af-d807-e63f-75211ce295bd@gmail.com>

On Mon, Oct 15, 2018 at 10:58:54PM +0200, Heiner Kallweit wrote:
> On 28.09.2018 15:18, Frederic Weisbecker wrote:
> > On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
> >> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
> >>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
> >>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
> >>>> the cpu being brought down during suspend? Because I get the warning only during
> >>>> suspend when the cpu is inactive already (but still online).
> >>>
> >>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
> >>>
> >>> Does this script eventually trigger it after some time?
> >>
> >> Any update to this?
> > 
> > Heiner? Can you please test the script I sent to you?
> > 
> > Thanks.
> > 
> Sorry, took some time .. And yes, running your script triggers the message too.
> 
> [   25.646015] x86: Booting SMP configuration:
> [   25.646044] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.664491] smpboot: CPU 1 is now offline
> [   25.679299] x86: Booting SMP configuration:
> [   25.679329] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.698449] smpboot: CPU 1 is now offline
> [   25.711698] x86: Booting SMP configuration:
> [   25.711727] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.729185] NOHZ: local_softirq_pending 202
> [   25.729229] NOHZ: local_softirq_pending 202
> [   25.730759] smpboot: CPU 1 is now offline
> [   25.744053] x86: Booting SMP configuration:
> [   25.744083] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.762520] smpboot: CPU 1 is now offline
> [   25.776834] x86: Booting SMP configuration:
> [   25.776863] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.794189] NOHZ: local_softirq_pending 202
> [   25.796672] smpboot: CPU 1 is now offline
> [   25.805970] x86: Booting SMP configuration:
> [   25.805999] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.827360] smpboot: CPU 1 is now offline
> [   25.839043] x86: Booting SMP configuration:
> [   25.839073] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.858184] NOHZ: local_softirq_pending 202
> [   25.862182] smpboot: CPU 1 is now offline
> [   25.873759] x86: Booting SMP configuration:
> [   25.873789] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.893385] smpboot: CPU 1 is now offline
> 

Sorry, I got sidetracked and almost forgot about it.

So this is triggered by CPU hotplug. At some point the CPU has an
opportunity to go idle and for some reason the timer softirq is still
pending. We need to know which timer this is about and why the timer
softirq keeps pending.

I'm going to need your help again. Can you please run the following (possibly
repeat until it triggers the bug) ?

   echo 1 > /sys/devices/system/cpu/cpu1/online

   # Pause and reset tracing
   echo 0 > /sys/kernel/debug/tracing/tracing_on
   echo > /sys/kernel/debug/tracing/trace

   # Turn on relevant events
   echo 1 > /sys/kernel/debug/tracing/events/timer/timer_*/enable
   echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_*/enable
   echo 1 > /sys/kernel/debug/tracing/tracing_on

   # Trigger
   echo 0 > /sys/devices/system/cpu/cpu1/online

   echo 0 > /sys/kernel/debug/tracing/tracing_on

And please apply the following patch before. With all that I'll have the
relevant informations stored in /sys/kernel/debug/tracing/per_cpu/cpu1/trace
Please send its content to me. Thanks!

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 69e673b..0e57a3b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -892,6 +892,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
 			pr_warn("NOHZ: local_softirq_pending %02x\n",
 				(unsigned int) local_softirq_pending());
+			trace_dump_stack(0);
 			ratelimit++;
 		}
 		return false;


  parent reply	other threads:[~2018-12-27  6:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16  6:13 Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem Heiner Kallweit
2018-08-18 11:26 ` Thomas Gleixner
2018-08-18 22:34   ` Heiner Kallweit
2018-08-24  4:12 ` Frederic Weisbecker
2018-08-24  5:59   ` Heiner Kallweit
2018-08-24  8:01     ` Thomas Gleixner
2018-08-24 14:30       ` Frederic Weisbecker
2018-08-24 17:06         ` Heiner Kallweit
2018-08-28  2:25           ` Frederic Weisbecker
2018-09-27 16:05             ` Thomas Gleixner
2018-09-28 13:18               ` Frederic Weisbecker
2018-09-28 20:35                 ` Heiner Kallweit
2018-10-15 20:58                 ` Heiner Kallweit
2018-12-24 21:11                   ` Heiner Kallweit
2018-12-27  6:53                   ` Frederic Weisbecker [this message]
2018-12-27 23:11                     ` Heiner Kallweit
2018-12-28  1:31                       ` Frederic Weisbecker
2018-12-28  6:34                         ` Heiner Kallweit
2018-12-28  6:39                           ` Heiner Kallweit
2019-01-09 22:20                             ` Heiner Kallweit
2019-01-11 21:36                               ` Frederic Weisbecker
2019-01-16  6:24                       ` Frederic Weisbecker
2019-01-16 18:42                         ` Heiner Kallweit
2019-01-24 19:37                         ` Heiner Kallweit
2019-02-14 19:05                           ` Heiner Kallweit
2019-02-14 21:47                             ` Thomas Gleixner
2019-02-14 22:33                               ` Heiner Kallweit
2019-02-15  0:31                                 ` Frederic Weisbecker
2019-02-16  9:14                                   ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181227065321.GA3749@lerouge \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=grygorii.strashko@ti.com \
    --cc=hkallweit1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).