linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
@ 2018-08-16  6:13 Heiner Kallweit
  2018-08-18 11:26 ` Thomas Gleixner
  2018-08-24  4:12 ` Frederic Weisbecker
  0 siblings, 2 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-08-16  6:13 UTC (permalink / raw)
  To: Anna-Maria Gleixner, Thomas Gleixner; +Cc: Linux Kernel Mailing List

Recently I started to get warning "NOHZ: local_softirq_pending 202" and
I think it's related to mentioned commit (didn't bisect it yet).
See log from suspending.

I have no reason to think the fix is wrong, it may just have revealed
another issue which existed before and was hidden by the bug.

Rgds, Heiner

[   75.073353] random: crng init done
[   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
[   78.619564] PM: suspend entry (deep)
[   78.619675] PM: Syncing filesystems ... done.
[   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   78.656094] OOM killer disabled.
[   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   78.658177] Suspending console(s) (use no_console_suspend to debug)
[   78.663066] nuvoton-cir 00:07: disabled
[   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   78.672210] sd 0:0:0:0: [sda] Stopping disk
[   78.786651] ACPI: Preparing to enter system sleep state S3
[   78.789613] PM: Saving platform NVS memory
[   78.789759] Disabling non-boot CPUs ...
[   78.805154] NOHZ: local_softirq_pending 202
[   78.805182] NOHZ: local_softirq_pending 202
[   78.807102] smpboot: CPU 1 is now offline

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-16  6:13 Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem Heiner Kallweit
@ 2018-08-18 11:26 ` Thomas Gleixner
  2018-08-18 22:34   ` Heiner Kallweit
  2018-08-24  4:12 ` Frederic Weisbecker
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-18 11:26 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Frederic Weisbecker

On Thu, 16 Aug 2018, Heiner Kallweit wrote:

> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
> I think it's related to mentioned commit (didn't bisect it yet).
> See log from suspending.
> 
> I have no reason to think the fix is wrong, it may just have revealed
> another issue which existed before and was hidden by the bug.

Looks so. That seems to be related to CPU offlining. No idea yet...

> Rgds, Heiner
> 
> [   75.073353] random: crng init done
> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
> [   78.619564] PM: suspend entry (deep)
> [   78.619675] PM: Syncing filesystems ... done.
> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   78.656094] OOM killer disabled.
> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
> [   78.663066] nuvoton-cir 00:07: disabled
> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
> [   78.786651] ACPI: Preparing to enter system sleep state S3
> [   78.789613] PM: Saving platform NVS memory
> [   78.789759] Disabling non-boot CPUs ...
> [   78.805154] NOHZ: local_softirq_pending 202
> [   78.805182] NOHZ: local_softirq_pending 202
> [   78.807102] smpboot: CPU 1 is now offline
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-18 11:26 ` Thomas Gleixner
@ 2018-08-18 22:34   ` Heiner Kallweit
  0 siblings, 0 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-08-18 22:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Frederic Weisbecker

On 18.08.2018 13:26, Thomas Gleixner wrote:
> On Thu, 16 Aug 2018, Heiner Kallweit wrote:
> 
>> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
>> I think it's related to mentioned commit (didn't bisect it yet).
>> See log from suspending.
>>
>> I have no reason to think the fix is wrong, it may just have revealed
>> another issue which existed before and was hidden by the bug.
> 
> Looks so. That seems to be related to CPU offlining. No idea yet...
> 
I checked a little further and at the time the warning is printed the
cpu is still online but not active any longer.
I can avoid the warning with the following change, but as a
disclaimer: I have no clue of this subsystem and don't know what
I'm doing ..

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c..19a030e40 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -862,13 +862,13 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 {
 	/*
-	 * If this CPU is offline and it is the one which updates
+	 * If this CPU is inactive and it is the one which updates
 	 * jiffies, then give up the assignment and let it be taken by
 	 * the CPU which runs the tick timer next. If we don't drop
 	 * this here the jiffies might be stale and do_timer() never
 	 * invoked.
 	 */
-	if (unlikely(!cpu_online(cpu))) {
+	if (unlikely(!cpu_active(cpu))) {
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 		/*
-- 



>> Rgds, Heiner
>>
>> [   75.073353] random: crng init done
>> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
>> [   78.619564] PM: suspend entry (deep)
>> [   78.619675] PM: Syncing filesystems ... done.
>> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> [   78.656094] OOM killer disabled.
>> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
>> [   78.663066] nuvoton-cir 00:07: disabled
>> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
>> [   78.786651] ACPI: Preparing to enter system sleep state S3
>> [   78.789613] PM: Saving platform NVS memory
>> [   78.789759] Disabling non-boot CPUs ...
>> [   78.805154] NOHZ: local_softirq_pending 202
>> [   78.805182] NOHZ: local_softirq_pending 202
>> [   78.807102] smpboot: CPU 1 is now offline
>>
> 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-16  6:13 Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem Heiner Kallweit
  2018-08-18 11:26 ` Thomas Gleixner
@ 2018-08-24  4:12 ` Frederic Weisbecker
  2018-08-24  5:59   ` Heiner Kallweit
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2018-08-24  4:12 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Anna-Maria Gleixner, Thomas Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
> I think it's related to mentioned commit (didn't bisect it yet).
> See log from suspending.
> 
> I have no reason to think the fix is wrong, it may just have revealed
> another issue which existed before and was hidden by the bug.
> 
> Rgds, Heiner
> 
> [   75.073353] random: crng init done
> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
> [   78.619564] PM: suspend entry (deep)
> [   78.619675] PM: Syncing filesystems ... done.
> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [   78.656094] OOM killer disabled.
> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
> [   78.663066] nuvoton-cir 00:07: disabled
> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
> [   78.786651] ACPI: Preparing to enter system sleep state S3
> [   78.789613] PM: Saving platform NVS memory
> [   78.789759] Disabling non-boot CPUs ...
> [   78.805154] NOHZ: local_softirq_pending 202
> [   78.805182] NOHZ: local_softirq_pending 202
> [   78.807102] smpboot: CPU 1 is now offline

I've tried to reproduce with suspend on disk but got unsuccessful.

A small question as I see someone is having a similar issue with a stable
release only. On which kernel did you trigger that: upstream or stable?

I'll continue investigating.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-24  4:12 ` Frederic Weisbecker
@ 2018-08-24  5:59   ` Heiner Kallweit
  2018-08-24  8:01     ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2018-08-24  5:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Anna-Maria Gleixner, Thomas Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On 24.08.2018 06:12, Frederic Weisbecker wrote:
> On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
>> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
>> I think it's related to mentioned commit (didn't bisect it yet).
>> See log from suspending.
>>
>> I have no reason to think the fix is wrong, it may just have revealed
>> another issue which existed before and was hidden by the bug.
>>
>> Rgds, Heiner
>>
>> [   75.073353] random: crng init done
>> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
>> [   78.619564] PM: suspend entry (deep)
>> [   78.619675] PM: Syncing filesystems ... done.
>> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> [   78.656094] OOM killer disabled.
>> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
>> [   78.663066] nuvoton-cir 00:07: disabled
>> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
>> [   78.786651] ACPI: Preparing to enter system sleep state S3
>> [   78.789613] PM: Saving platform NVS memory
>> [   78.789759] Disabling non-boot CPUs ...
>> [   78.805154] NOHZ: local_softirq_pending 202
>> [   78.805182] NOHZ: local_softirq_pending 202
>> [   78.807102] smpboot: CPU 1 is now offline
> 
> I've tried to reproduce with suspend on disk but got unsuccessful.
> 
> A small question as I see someone is having a similar issue with a stable
> release only. On which kernel did you trigger that: upstream or stable?
> 
> I'll continue investigating.
> 
> Thanks.
> 
Affected is recent linux-next, after the commit mentioned in the subject.
I can work around the warning (not sure whether it's a proper fix),
see here:
https://lkml.org/lkml/2018/8/18/272

Rgds, Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-24  5:59   ` Heiner Kallweit
@ 2018-08-24  8:01     ` Thomas Gleixner
  2018-08-24 14:30       ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-08-24  8:01 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Frederic Weisbecker, Anna-Maria Gleixner,
	Linux Kernel Mailing List, Grygorii Strashko

On Fri, 24 Aug 2018, Heiner Kallweit wrote:
> On 24.08.2018 06:12, Frederic Weisbecker wrote:
> > On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
> >> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
> >> I think it's related to mentioned commit (didn't bisect it yet).
> >> See log from suspending.
> >>
> >> I have no reason to think the fix is wrong, it may just have revealed
> >> another issue which existed before and was hidden by the bug.
> >>
> >> Rgds, Heiner
> >>
> >> [   75.073353] random: crng init done
> >> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
> >> [   78.619564] PM: suspend entry (deep)
> >> [   78.619675] PM: Syncing filesystems ... done.
> >> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
> >> [   78.656094] OOM killer disabled.
> >> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
> >> [   78.663066] nuvoton-cir 00:07: disabled
> >> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
> >> [   78.786651] ACPI: Preparing to enter system sleep state S3
> >> [   78.789613] PM: Saving platform NVS memory
> >> [   78.789759] Disabling non-boot CPUs ...
> >> [   78.805154] NOHZ: local_softirq_pending 202
> >> [   78.805182] NOHZ: local_softirq_pending 202
> >> [   78.807102] smpboot: CPU 1 is now offline
> > 
> > I've tried to reproduce with suspend on disk but got unsuccessful.
> > 
> > A small question as I see someone is having a similar issue with a stable
> > release only. On which kernel did you trigger that: upstream or stable?
> > 
> > I'll continue investigating.
> > 
> > Thanks.
> > 
> Affected is recent linux-next, after the commit mentioned in the subject.
> I can work around the warning (not sure whether it's a proper fix),
> see here:
> https://lkml.org/lkml/2018/8/18/272

Can you try the one I posted in this thread:

 https://lkml.kernel.org/r/alpine.DEB.2.21.1808240851420.1668@nanos.tec.linutronix.de

Also below for reference.

Thanks,

	tglx

8<----------------
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c0ed..6aab9d54a331 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
 		static int ratelimit;
 
-		if (ratelimit < 10 &&
+		if (ratelimit < 10 && !in_softirq() &&
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
 			pr_warn("NOHZ: local_softirq_pending %02x\n",
 				(unsigned int) local_softirq_pending());

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-24  8:01     ` Thomas Gleixner
@ 2018-08-24 14:30       ` Frederic Weisbecker
  2018-08-24 17:06         ` Heiner Kallweit
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2018-08-24 14:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Heiner Kallweit, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Fri, Aug 24, 2018 at 10:01:35AM +0200, Thomas Gleixner wrote:
> On Fri, 24 Aug 2018, Heiner Kallweit wrote:
> > On 24.08.2018 06:12, Frederic Weisbecker wrote:
> > > On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
> > >> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
> > >> I think it's related to mentioned commit (didn't bisect it yet).
> > >> See log from suspending.
> > >>
> > >> I have no reason to think the fix is wrong, it may just have revealed
> > >> another issue which existed before and was hidden by the bug.
> > >>
> > >> Rgds, Heiner
> > >>
> > >> [   75.073353] random: crng init done
> > >> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
> > >> [   78.619564] PM: suspend entry (deep)
> > >> [   78.619675] PM: Syncing filesystems ... done.
> > >> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
> > >> [   78.656094] OOM killer disabled.
> > >> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > >> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
> > >> [   78.663066] nuvoton-cir 00:07: disabled
> > >> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> > >> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
> > >> [   78.786651] ACPI: Preparing to enter system sleep state S3
> > >> [   78.789613] PM: Saving platform NVS memory
> > >> [   78.789759] Disabling non-boot CPUs ...
> > >> [   78.805154] NOHZ: local_softirq_pending 202
> > >> [   78.805182] NOHZ: local_softirq_pending 202
> > >> [   78.807102] smpboot: CPU 1 is now offline
> > > 
> > > I've tried to reproduce with suspend on disk but got unsuccessful.
> > > 
> > > A small question as I see someone is having a similar issue with a stable
> > > release only. On which kernel did you trigger that: upstream or stable?
> > > 
> > > I'll continue investigating.
> > > 
> > > Thanks.
> > > 
> > Affected is recent linux-next, after the commit mentioned in the subject.
> > I can work around the warning (not sure whether it's a proper fix),
> > see here:
> > https://lkml.org/lkml/2018/8/18/272
> 
> Can you try the one I posted in this thread:
> 
>  https://lkml.kernel.org/r/alpine.DEB.2.21.1808240851420.1668@nanos.tec.linutronix.de
> 
> Also below for reference.
> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 5b33e2f5c0ed..6aab9d54a331 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>  		static int ratelimit;
>  
> -		if (ratelimit < 10 &&
> +		if (ratelimit < 10 && !in_softirq() &&
>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
>  				(unsigned int) local_softirq_pending());

I fear it may not work in his case because it happens in -next and we don't stop
the idle tick from IRQ tail anymore. So we shouldn't be interrupting a softirq
in this path. Still it's worth trying, I may well be missing something.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-24 14:30       ` Frederic Weisbecker
@ 2018-08-24 17:06         ` Heiner Kallweit
  2018-08-28  2:25           ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2018-08-24 17:06 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Grygorii Strashko

On 24.08.2018 16:30, Frederic Weisbecker wrote:
> On Fri, Aug 24, 2018 at 10:01:35AM +0200, Thomas Gleixner wrote:
>> On Fri, 24 Aug 2018, Heiner Kallweit wrote:
>>> On 24.08.2018 06:12, Frederic Weisbecker wrote:
>>>> On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
>>>>> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
>>>>> I think it's related to mentioned commit (didn't bisect it yet).
>>>>> See log from suspending.
>>>>>
>>>>> I have no reason to think the fix is wrong, it may just have revealed
>>>>> another issue which existed before and was hidden by the bug.
>>>>>
>>>>> Rgds, Heiner
>>>>>
>>>>> [   75.073353] random: crng init done
>>>>> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
>>>>> [   78.619564] PM: suspend entry (deep)
>>>>> [   78.619675] PM: Syncing filesystems ... done.
>>>>> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
>>>>> [   78.656094] OOM killer disabled.
>>>>> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
>>>>> [   78.663066] nuvoton-cir 00:07: disabled
>>>>> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>>> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
>>>>> [   78.786651] ACPI: Preparing to enter system sleep state S3
>>>>> [   78.789613] PM: Saving platform NVS memory
>>>>> [   78.789759] Disabling non-boot CPUs ...
>>>>> [   78.805154] NOHZ: local_softirq_pending 202
>>>>> [   78.805182] NOHZ: local_softirq_pending 202
>>>>> [   78.807102] smpboot: CPU 1 is now offline
>>>>
>>>> I've tried to reproduce with suspend on disk but got unsuccessful.
>>>>
>>>> A small question as I see someone is having a similar issue with a stable
>>>> release only. On which kernel did you trigger that: upstream or stable?
>>>>
>>>> I'll continue investigating.
>>>>
>>>> Thanks.
>>>>
>>> Affected is recent linux-next, after the commit mentioned in the subject.
>>> I can work around the warning (not sure whether it's a proper fix),
>>> see here:
>>> https://lkml.org/lkml/2018/8/18/272
>>
>> Can you try the one I posted in this thread:
>>
>>  https://lkml.kernel.org/r/alpine.DEB.2.21.1808240851420.1668@nanos.tec.linutronix.de
>>
>> Also below for reference.
>>
>> Thanks,
>>
>> 	tglx
>>
>> 8<----------------
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> index 5b33e2f5c0ed..6aab9d54a331 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>>  	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
>>  		static int ratelimit;
>>  
>> -		if (ratelimit < 10 &&
>> +		if (ratelimit < 10 && !in_softirq() &&
>>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
>>  				(unsigned int) local_softirq_pending());
> 
> I fear it may not work in his case because it happens in -next and we don't stop
> the idle tick from IRQ tail anymore. So we shouldn't be interrupting a softirq
> in this path. Still it's worth trying, I may well be missing something.
> 
> Thanks.
> 
I tested it and Frederic is right, it doesn't help. Can it be somehow related to
the cpu being brought down during suspend? Because I get the warning only during
suspend when the cpu is inactive already (but still online).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-24 17:06         ` Heiner Kallweit
@ 2018-08-28  2:25           ` Frederic Weisbecker
  2018-09-27 16:05             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2018-08-28  2:25 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
> On 24.08.2018 16:30, Frederic Weisbecker wrote:
> >> Can you try the one I posted in this thread:
> >>
> >>  https://lkml.kernel.org/r/alpine.DEB.2.21.1808240851420.1668@nanos.tec.linutronix.de
> >>
> >> Also below for reference.
> >>
> >> Thanks,
> >>
> >> 	tglx
> >>
> >> 8<----------------
> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> >> index 5b33e2f5c0ed..6aab9d54a331 100644
> >> --- a/kernel/time/tick-sched.c
> >> +++ b/kernel/time/tick-sched.c
> >> @@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> >>  	if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
> >>  		static int ratelimit;
> >>  
> >> -		if (ratelimit < 10 &&
> >> +		if (ratelimit < 10 && !in_softirq() &&
> >>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
> >>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
> >>  				(unsigned int) local_softirq_pending());
> > 
> > I fear it may not work in his case because it happens in -next and we don't stop
> > the idle tick from IRQ tail anymore. So we shouldn't be interrupting a softirq
> > in this path. Still it's worth trying, I may well be missing something.
> > 
> > Thanks.
> > 
> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
> the cpu being brought down during suspend? Because I get the warning only during
> suspend when the cpu is inactive already (but still online).

It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.

Does this script eventually trigger it after some time?


[-- Attachment #2: hotplug_fun --]
[-- Type: text/plain, Size: 205 bytes --]

#!/bin/bash

do_hotplug()
{
	for i in $(seq 1 $2)
	do
		echo $1 > /sys/devices/system/cpu/cpu$i/online
	done
}

LAST_CPU=$(($(nproc)-1))

while true
do
	do_hotplug 0 $LAST_CPU
	do_hotplug 1 $LAST_CPU
done

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-08-28  2:25           ` Frederic Weisbecker
@ 2018-09-27 16:05             ` Thomas Gleixner
  2018-09-28 13:18               ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-09-27 16:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Heiner Kallweit, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
> > I tested it and Frederic is right, it doesn't help. Can it be somehow related to
> > the cpu being brought down during suspend? Because I get the warning only during
> > suspend when the cpu is inactive already (but still online).
> 
> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
> 
> Does this script eventually trigger it after some time?

Any update to this?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-09-27 16:05             ` Thomas Gleixner
@ 2018-09-28 13:18               ` Frederic Weisbecker
  2018-09-28 20:35                 ` Heiner Kallweit
  2018-10-15 20:58                 ` Heiner Kallweit
  0 siblings, 2 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2018-09-28 13:18 UTC (permalink / raw)
  To: Thomas Gleixner, Heiner Kallweit
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Grygorii Strashko

On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
> > On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
> > > I tested it and Frederic is right, it doesn't help. Can it be somehow related to
> > > the cpu being brought down during suspend? Because I get the warning only during
> > > suspend when the cpu is inactive already (but still online).
> > 
> > It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
> > 
> > Does this script eventually trigger it after some time?
> 
> Any update to this?

Heiner? Can you please test the script I sent to you?

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-09-28 13:18               ` Frederic Weisbecker
@ 2018-09-28 20:35                 ` Heiner Kallweit
  2018-10-15 20:58                 ` Heiner Kallweit
  1 sibling, 0 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-09-28 20:35 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Grygorii Strashko

On 28.09.2018 15:18, Frederic Weisbecker wrote:
> On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
>> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
>>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
>>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
>>>> the cpu being brought down during suspend? Because I get the warning only during
>>>> suspend when the cpu is inactive already (but still online).
>>>
>>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
>>>
>>> Does this script eventually trigger it after some time?
>>
>> Any update to this?
> 
> Heiner? Can you please test the script I sent to you?
> 
> Thanks.
> 
Sorry, this wasn't on my radar any longer. Next week I'm on
travel, I can do it the week after.

Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-09-28 13:18               ` Frederic Weisbecker
  2018-09-28 20:35                 ` Heiner Kallweit
@ 2018-10-15 20:58                 ` Heiner Kallweit
  2018-12-24 21:11                   ` Heiner Kallweit
  2018-12-27  6:53                   ` Frederic Weisbecker
  1 sibling, 2 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-10-15 20:58 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Grygorii Strashko

On 28.09.2018 15:18, Frederic Weisbecker wrote:
> On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
>> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
>>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
>>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
>>>> the cpu being brought down during suspend? Because I get the warning only during
>>>> suspend when the cpu is inactive already (but still online).
>>>
>>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
>>>
>>> Does this script eventually trigger it after some time?
>>
>> Any update to this?
> 
> Heiner? Can you please test the script I sent to you?
> 
> Thanks.
> 
Sorry, took some time .. And yes, running your script triggers the message too.

[   25.646015] x86: Booting SMP configuration:
[   25.646044] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.664491] smpboot: CPU 1 is now offline
[   25.679299] x86: Booting SMP configuration:
[   25.679329] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.698449] smpboot: CPU 1 is now offline
[   25.711698] x86: Booting SMP configuration:
[   25.711727] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.729185] NOHZ: local_softirq_pending 202
[   25.729229] NOHZ: local_softirq_pending 202
[   25.730759] smpboot: CPU 1 is now offline
[   25.744053] x86: Booting SMP configuration:
[   25.744083] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.762520] smpboot: CPU 1 is now offline
[   25.776834] x86: Booting SMP configuration:
[   25.776863] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.794189] NOHZ: local_softirq_pending 202
[   25.796672] smpboot: CPU 1 is now offline
[   25.805970] x86: Booting SMP configuration:
[   25.805999] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.827360] smpboot: CPU 1 is now offline
[   25.839043] x86: Booting SMP configuration:
[   25.839073] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.858184] NOHZ: local_softirq_pending 202
[   25.862182] smpboot: CPU 1 is now offline
[   25.873759] x86: Booting SMP configuration:
[   25.873789] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   25.893385] smpboot: CPU 1 is now offline


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-10-15 20:58                 ` Heiner Kallweit
@ 2018-12-24 21:11                   ` Heiner Kallweit
  2018-12-27  6:53                   ` Frederic Weisbecker
  1 sibling, 0 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-12-24 21:11 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: Anna-Maria Gleixner, Linux Kernel Mailing List, Grygorii Strashko

On 15.10.2018 22:58, Heiner Kallweit wrote:
> On 28.09.2018 15:18, Frederic Weisbecker wrote:
>> On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
>>> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
>>>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
>>>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
>>>>> the cpu being brought down during suspend? Because I get the warning only during
>>>>> suspend when the cpu is inactive already (but still online).
>>>>
>>>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
>>>>
>>>> Does this script eventually trigger it after some time?
>>>
>>> Any update to this?
>>
>> Heiner? Can you please test the script I sent to you?
>>
>> Thanks.
>>
> Sorry, took some time .. And yes, running your script triggers the message too.
> 
> [   25.646015] x86: Booting SMP configuration:
> [   25.646044] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.664491] smpboot: CPU 1 is now offline
> [   25.679299] x86: Booting SMP configuration:
> [   25.679329] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.698449] smpboot: CPU 1 is now offline
> [   25.711698] x86: Booting SMP configuration:
> [   25.711727] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.729185] NOHZ: local_softirq_pending 202
> [   25.729229] NOHZ: local_softirq_pending 202
> [   25.730759] smpboot: CPU 1 is now offline
> [   25.744053] x86: Booting SMP configuration:
> [   25.744083] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.762520] smpboot: CPU 1 is now offline
> [   25.776834] x86: Booting SMP configuration:
> [   25.776863] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.794189] NOHZ: local_softirq_pending 202
> [   25.796672] smpboot: CPU 1 is now offline
> [   25.805970] x86: Booting SMP configuration:
> [   25.805999] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.827360] smpboot: CPU 1 is now offline
> [   25.839043] x86: Booting SMP configuration:
> [   25.839073] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.858184] NOHZ: local_softirq_pending 202
> [   25.862182] smpboot: CPU 1 is now offline
> [   25.873759] x86: Booting SMP configuration:
> [   25.873789] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.893385] smpboot: CPU 1 is now offline
> 
Almost forgot about his topic, but the warning is still there.
Has there been any progress in analysis?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-10-15 20:58                 ` Heiner Kallweit
  2018-12-24 21:11                   ` Heiner Kallweit
@ 2018-12-27  6:53                   ` Frederic Weisbecker
  2018-12-27 23:11                     ` Heiner Kallweit
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2018-12-27  6:53 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Mon, Oct 15, 2018 at 10:58:54PM +0200, Heiner Kallweit wrote:
> On 28.09.2018 15:18, Frederic Weisbecker wrote:
> > On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
> >> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
> >>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
> >>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
> >>>> the cpu being brought down during suspend? Because I get the warning only during
> >>>> suspend when the cpu is inactive already (but still online).
> >>>
> >>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
> >>>
> >>> Does this script eventually trigger it after some time?
> >>
> >> Any update to this?
> > 
> > Heiner? Can you please test the script I sent to you?
> > 
> > Thanks.
> > 
> Sorry, took some time .. And yes, running your script triggers the message too.
> 
> [   25.646015] x86: Booting SMP configuration:
> [   25.646044] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.664491] smpboot: CPU 1 is now offline
> [   25.679299] x86: Booting SMP configuration:
> [   25.679329] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.698449] smpboot: CPU 1 is now offline
> [   25.711698] x86: Booting SMP configuration:
> [   25.711727] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.729185] NOHZ: local_softirq_pending 202
> [   25.729229] NOHZ: local_softirq_pending 202
> [   25.730759] smpboot: CPU 1 is now offline
> [   25.744053] x86: Booting SMP configuration:
> [   25.744083] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.762520] smpboot: CPU 1 is now offline
> [   25.776834] x86: Booting SMP configuration:
> [   25.776863] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.794189] NOHZ: local_softirq_pending 202
> [   25.796672] smpboot: CPU 1 is now offline
> [   25.805970] x86: Booting SMP configuration:
> [   25.805999] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.827360] smpboot: CPU 1 is now offline
> [   25.839043] x86: Booting SMP configuration:
> [   25.839073] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.858184] NOHZ: local_softirq_pending 202
> [   25.862182] smpboot: CPU 1 is now offline
> [   25.873759] x86: Booting SMP configuration:
> [   25.873789] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   25.893385] smpboot: CPU 1 is now offline
> 

Sorry, I got sidetracked and almost forgot about it.

So this is triggered by CPU hotplug. At some point the CPU has an
opportunity to go idle and for some reason the timer softirq is still
pending. We need to know which timer this is about and why the timer
softirq keeps pending.

I'm going to need your help again. Can you please run the following (possibly
repeat until it triggers the bug) ?

   echo 1 > /sys/devices/system/cpu/cpu1/online

   # Pause and reset tracing
   echo 0 > /sys/kernel/debug/tracing/tracing_on
   echo > /sys/kernel/debug/tracing/trace

   # Turn on relevant events
   echo 1 > /sys/kernel/debug/tracing/events/timer/timer_*/enable
   echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_*/enable
   echo 1 > /sys/kernel/debug/tracing/tracing_on

   # Trigger
   echo 0 > /sys/devices/system/cpu/cpu1/online

   echo 0 > /sys/kernel/debug/tracing/tracing_on

And please apply the following patch before. With all that I'll have the
relevant informations stored in /sys/kernel/debug/tracing/per_cpu/cpu1/trace
Please send its content to me. Thanks!

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 69e673b..0e57a3b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -892,6 +892,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
 			pr_warn("NOHZ: local_softirq_pending %02x\n",
 				(unsigned int) local_softirq_pending());
+			trace_dump_stack(0);
 			ratelimit++;
 		}
 		return false;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-27  6:53                   ` Frederic Weisbecker
@ 2018-12-27 23:11                     ` Heiner Kallweit
  2018-12-28  1:31                       ` Frederic Weisbecker
  2019-01-16  6:24                       ` Frederic Weisbecker
  0 siblings, 2 replies; 29+ messages in thread
From: Heiner Kallweit @ 2018-12-27 23:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On 27.12.2018 07:53, Frederic Weisbecker wrote:
> On Mon, Oct 15, 2018 at 10:58:54PM +0200, Heiner Kallweit wrote:
>> On 28.09.2018 15:18, Frederic Weisbecker wrote:
>>> On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
>>>> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
>>>>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
>>>>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
>>>>>> the cpu being brought down during suspend? Because I get the warning only during
>>>>>> suspend when the cpu is inactive already (but still online).
>>>>>
>>>>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
>>>>>
>>>>> Does this script eventually trigger it after some time?
>>>>
>>>> Any update to this?
>>>
>>> Heiner? Can you please test the script I sent to you?
>>>
>>> Thanks.
>>>
>> Sorry, took some time .. And yes, running your script triggers the message too.
>>
[...]
> 
> Sorry, I got sidetracked and almost forgot about it.
> 
> So this is triggered by CPU hotplug. At some point the CPU has an
> opportunity to go idle and for some reason the timer softirq is still
> pending. We need to know which timer this is about and why the timer
> softirq keeps pending.
> 
> I'm going to need your help again. Can you please run the following (possibly
> repeat until it triggers the bug) ?
> 
>    echo 1 > /sys/devices/system/cpu/cpu1/online
> 
>    # Pause and reset tracing
>    echo 0 > /sys/kernel/debug/tracing/tracing_on
>    echo > /sys/kernel/debug/tracing/trace
> 
>    # Turn on relevant events
>    echo 1 > /sys/kernel/debug/tracing/events/timer/timer_*/enable
>    echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_*/enable
>    echo 1 > /sys/kernel/debug/tracing/tracing_on
> 
>    # Trigger
>    echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>    echo 0 > /sys/kernel/debug/tracing/tracing_on
> 
> And please apply the following patch before. With all that I'll have the
> relevant informations stored in /sys/kernel/debug/tracing/per_cpu/cpu1/trace
> Please send its content to me. Thanks!
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 69e673b..0e57a3b 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -892,6 +892,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
>  				(unsigned int) local_softirq_pending());
> +			trace_dump_stack(0);
>  			ratelimit++;
>  		}
>  		return false;
> 
> 

OK, did as you advised and here comes the trace. That's the related dmesg part:

[ 1479.025092] x86: Booting SMP configuration:
[ 1479.025129] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1479.094715] NOHZ: local_softirq_pending 202
[ 1479.096557] smpboot: CPU 1 is now offline

Hope it helps.
Heiner


# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
         cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
          <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
         cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
         cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d..1  1479.113077: <stack trace>
 => can_stop_idle_tick.isra.14
 => tick_nohz_get_sleep_length
 => menu_select
 => cpuidle_select
 => do_idle
 => cpu_startup_entry
 => start_secondary
 => secondary_startup_64
          <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-27 23:11                     ` Heiner Kallweit
@ 2018-12-28  1:31                       ` Frederic Weisbecker
  2018-12-28  6:34                         ` Heiner Kallweit
  2019-01-16  6:24                       ` Frederic Weisbecker
  1 sibling, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2018-12-28  1:31 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> 
> OK, did as you advised and here comes the trace. That's the related dmesg part:
> 
> [ 1479.025092] x86: Booting SMP configuration:
> [ 1479.025129] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 1479.094715] NOHZ: local_softirq_pending 202
> [ 1479.096557] smpboot: CPU 1 is now offline
> 
> Hope it helps.
> Heiner
> 
> 
> # tracer: nop
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
[...]
>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]

Interesting, the softirq is raised from hardirq but it's not handled in the end of
the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
to handle the pending work before we go idle. However I can imagine a small window
where such an expectation may not be met: if the softirq is raised after the ksoftirqd
thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
(CPUHP_TEARDOWN_CPU).

I don't know if we can afford to ignore a softirq even at this late stage. We should
probably avoid leaking any. So here is a possible fix, if you don't mind trying:

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d288133..716096b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -56,6 +56,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(int, ksoftirqd_parked);
 
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
@@ -363,7 +364,7 @@ static inline void invoke_softirq(void)
 	if (ksoftirqd_running(local_softirq_pending()))
 		return;
 
-	if (!force_irqthreads) {
+	if (!force_irqthreads || __this_cpu_read(ksoftirqd_parked)) {
 #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
 		/*
 		 * We can safely execute softirq on the current stack if
@@ -659,6 +660,22 @@ static void run_ksoftirqd(unsigned int cpu)
 	local_irq_enable();
 }
 
+static void ksoftirqd_park(unsigned int cpu)
+{
+	local_irq_disable();
+	__this_cpu_write(ksoftirqd_parked, 1);
+
+	if (local_softirq_pending())
+		run_ksoftirqd(cpu);
+
+	local_irq_enable();
+}
+
+static void ksoftirqd_unpark(unsigned int cpu)
+{
+	__this_cpu_write(ksoftirqd_parked, 0);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 /*
  * tasklet_kill_immediate is called to remove a tasklet which can already be
@@ -724,6 +741,8 @@ static int takeover_tasklets(unsigned int cpu)
 static struct smp_hotplug_thread softirq_threads = {
 	.store			= &ksoftirqd,
 	.thread_should_run	= ksoftirqd_should_run,
+	.park			= ksoftirqd_park,
+	.unpark			= ksoftirqd_unpark,
 	.thread_fn		= run_ksoftirqd,
 	.thread_comm		= "ksoftirqd/%u",
 };

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-28  1:31                       ` Frederic Weisbecker
@ 2018-12-28  6:34                         ` Heiner Kallweit
  2018-12-28  6:39                           ` Heiner Kallweit
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2018-12-28  6:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On 28.12.2018 02:31, Frederic Weisbecker wrote:
> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>
[...]
> 
> Interesting, the softirq is raised from hardirq but it's not handled in the end of
> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
> to handle the pending work before we go idle. However I can imagine a small window
> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
> (CPUHP_TEARDOWN_CPU).
> 
I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
For testing purposes I sometimes trigger system suspend via network, so there is
network adapter activity when system suspends. Apart from that nothing really
exciting:
            CPU0       CPU1       CPU2       CPU3
   0:         43          0          0          0   IO-APIC    2-edge      timer
   1:          4          0          0          0   IO-APIC    1-edge      i8042
   8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
   9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          5   IO-APIC   12-edge      i8042
 120:          0          0          0          0   PCI-MSI 311296-edge      PCIe PME
 121:          0          0          0          0   PCI-MSI 315392-edge      PCIe PME
 122:          0          0          0          0   PCI-MSI 327680-edge      PCIe PME
 123:          0          0       3328          0   PCI-MSI 294912-edge      ahci[0000:00:12.0]
 124:          0        133          0          0   PCI-MSI 344064-edge      xhci_hcd
 125:          0          0         32          0   PCI-MSI 245760-edge      mei_me
 127:        381          0          0          0   PCI-MSI 1572864-edge      enp3s0
 128:          0          0          0        236   PCI-MSI 32768-edge      i915
 129:          0        374          0          0   PCI-MSI 229376-edge      snd_hda_intel:card0

> I don't know if we can afford to ignore a softirq even at this late stage. We should
> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
> 
I tested your patch and at least in the first minutes of testing couldn't reproduce
the issue any longer. I tested manual system suspend and the following script you
sent when we started to analyze the issue.

Heiner

--------------------------------------------------------------------------

#!/bin/bash

do_hotplug()
{
	for i in $(seq 1 $2)
	do
		echo $1 > /sys/devices/system/cpu/cpu$i/online
	done
}

LAST_CPU=$(($(nproc)-1))

while true
do
	do_hotplug 0 $LAST_CPU
	do_hotplug 1 $LAST_CPU
done

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-28  6:34                         ` Heiner Kallweit
@ 2018-12-28  6:39                           ` Heiner Kallweit
  2019-01-09 22:20                             ` Heiner Kallweit
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2018-12-28  6:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On 28.12.2018 07:34, Heiner Kallweit wrote:
> On 28.12.2018 02:31, Frederic Weisbecker wrote:
>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>>
> [...]
>>
>> Interesting, the softirq is raised from hardirq but it's not handled in the end of
>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
>> to handle the pending work before we go idle. However I can imagine a small window
>> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
>> (CPUHP_TEARDOWN_CPU).
>>
> I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
> For testing purposes I sometimes trigger system suspend via network, so there is
> network adapter activity when system suspends. Apart from that nothing really
> exciting:
>             CPU0       CPU1       CPU2       CPU3
>    0:         43          0          0          0   IO-APIC    2-edge      timer
>    1:          4          0          0          0   IO-APIC    1-edge      i8042
>    8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
>    9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
>   12:          0          0          0          5   IO-APIC   12-edge      i8042
>  120:          0          0          0          0   PCI-MSI 311296-edge      PCIe PME
>  121:          0          0          0          0   PCI-MSI 315392-edge      PCIe PME
>  122:          0          0          0          0   PCI-MSI 327680-edge      PCIe PME
>  123:          0          0       3328          0   PCI-MSI 294912-edge      ahci[0000:00:12.0]
>  124:          0        133          0          0   PCI-MSI 344064-edge      xhci_hcd
>  125:          0          0         32          0   PCI-MSI 245760-edge      mei_me
>  127:        381          0          0          0   PCI-MSI 1572864-edge      enp3s0
>  128:          0          0          0        236   PCI-MSI 32768-edge      i915
>  129:          0        374          0          0   PCI-MSI 229376-edge      snd_hda_intel:card0
> 
>> I don't know if we can afford to ignore a softirq even at this late stage. We should
>> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
>>
> I tested your patch and at least in the first minutes of testing couldn't reproduce
> the issue any longer. I tested manual system suspend and the following script you
> sent when we started to analyze the issue.
> 

Also after some more time the issue didn't occur again. So it seems your analysis
was right and also the approach to fix it. Thanks!
Will let you know in case the issue should pop up again under special
circumstances.


> Heiner
> 
> --------------------------------------------------------------------------
> 
> #!/bin/bash
> 
> do_hotplug()
> {
> 	for i in $(seq 1 $2)
> 	do
> 		echo $1 > /sys/devices/system/cpu/cpu$i/online
> 	done
> }
> 
> LAST_CPU=$(($(nproc)-1))
> 
> while true
> do
> 	do_hotplug 0 $LAST_CPU
> 	do_hotplug 1 $LAST_CPU
> done
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-28  6:39                           ` Heiner Kallweit
@ 2019-01-09 22:20                             ` Heiner Kallweit
  2019-01-11 21:36                               ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2019-01-09 22:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On 28.12.2018 07:39, Heiner Kallweit wrote:
> On 28.12.2018 07:34, Heiner Kallweit wrote:
>> On 28.12.2018 02:31, Frederic Weisbecker wrote:
>>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>>>
>> [...]
>>>
>>> Interesting, the softirq is raised from hardirq but it's not handled in the end of
>>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
>>> to handle the pending work before we go idle. However I can imagine a small window
>>> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
>>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
>>> (CPUHP_TEARDOWN_CPU).
>>>
>> I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
>> For testing purposes I sometimes trigger system suspend via network, so there is
>> network adapter activity when system suspends. Apart from that nothing really
>> exciting:
>>             CPU0       CPU1       CPU2       CPU3
>>    0:         43          0          0          0   IO-APIC    2-edge      timer
>>    1:          4          0          0          0   IO-APIC    1-edge      i8042
>>    8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
>>    9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
>>   12:          0          0          0          5   IO-APIC   12-edge      i8042
>>  120:          0          0          0          0   PCI-MSI 311296-edge      PCIe PME
>>  121:          0          0          0          0   PCI-MSI 315392-edge      PCIe PME
>>  122:          0          0          0          0   PCI-MSI 327680-edge      PCIe PME
>>  123:          0          0       3328          0   PCI-MSI 294912-edge      ahci[0000:00:12.0]
>>  124:          0        133          0          0   PCI-MSI 344064-edge      xhci_hcd
>>  125:          0          0         32          0   PCI-MSI 245760-edge      mei_me
>>  127:        381          0          0          0   PCI-MSI 1572864-edge      enp3s0
>>  128:          0          0          0        236   PCI-MSI 32768-edge      i915
>>  129:          0        374          0          0   PCI-MSI 229376-edge      snd_hda_intel:card0
>>
>>> I don't know if we can afford to ignore a softirq even at this late stage. We should
>>> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
>>>
>> I tested your patch and at least in the first minutes of testing couldn't reproduce
>> the issue any longer. I tested manual system suspend and the following script you
>> sent when we started to analyze the issue.
>>
> 
> Also after some more time the issue didn't occur again. So it seems your analysis
> was right and also the approach to fix it. Thanks!
> Will let you know in case the issue should pop up again under special
> circumstances.
> 
Frederic, so far this fix didn't appear in linux-next, are you going to submit it?

> 
>> Heiner
>>
>> --------------------------------------------------------------------------
>>
>> #!/bin/bash
>>
>> do_hotplug()
>> {
>> 	for i in $(seq 1 $2)
>> 	do
>> 		echo $1 > /sys/devices/system/cpu/cpu$i/online
>> 	done
>> }
>>
>> LAST_CPU=$(($(nproc)-1))
>>
>> while true
>> do
>> 	do_hotplug 0 $LAST_CPU
>> 	do_hotplug 1 $LAST_CPU
>> done
>>
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-01-09 22:20                             ` Heiner Kallweit
@ 2019-01-11 21:36                               ` Frederic Weisbecker
  0 siblings, 0 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2019-01-11 21:36 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List,
	Grygorii Strashko

On Wed, Jan 09, 2019 at 11:20:50PM +0100, Heiner Kallweit wrote:
> On 28.12.2018 07:39, Heiner Kallweit wrote:
> > On 28.12.2018 07:34, Heiner Kallweit wrote:
> >> On 28.12.2018 02:31, Frederic Weisbecker wrote:
> >>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> >>>>
> >> [...]
> >>>
> >>> Interesting, the softirq is raised from hardirq but it's not handled in the end of
> >>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
> >>> to handle the pending work before we go idle. However I can imagine a small window
> >>> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
> >>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
> >>> (CPUHP_TEARDOWN_CPU).
> >>>
> >> I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
> >> For testing purposes I sometimes trigger system suspend via network, so there is
> >> network adapter activity when system suspends. Apart from that nothing really
> >> exciting:
> >>             CPU0       CPU1       CPU2       CPU3
> >>    0:         43          0          0          0   IO-APIC    2-edge      timer
> >>    1:          4          0          0          0   IO-APIC    1-edge      i8042
> >>    8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
> >>    9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
> >>   12:          0          0          0          5   IO-APIC   12-edge      i8042
> >>  120:          0          0          0          0   PCI-MSI 311296-edge      PCIe PME
> >>  121:          0          0          0          0   PCI-MSI 315392-edge      PCIe PME
> >>  122:          0          0          0          0   PCI-MSI 327680-edge      PCIe PME
> >>  123:          0          0       3328          0   PCI-MSI 294912-edge      ahci[0000:00:12.0]
> >>  124:          0        133          0          0   PCI-MSI 344064-edge      xhci_hcd
> >>  125:          0          0         32          0   PCI-MSI 245760-edge      mei_me
> >>  127:        381          0          0          0   PCI-MSI 1572864-edge      enp3s0
> >>  128:          0          0          0        236   PCI-MSI 32768-edge      i915
> >>  129:          0        374          0          0   PCI-MSI 229376-edge      snd_hda_intel:card0
> >>
> >>> I don't know if we can afford to ignore a softirq even at this late stage. We should
> >>> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
> >>>
> >> I tested your patch and at least in the first minutes of testing couldn't reproduce
> >> the issue any longer. I tested manual system suspend and the following script you
> >> sent when we started to analyze the issue.
> >>
> > 
> > Also after some more time the issue didn't occur again. So it seems your analysis
> > was right and also the approach to fix it. Thanks!
> > Will let you know in case the issue should pop up again under special
> > circumstances.
> > 
> Frederic, so far this fix didn't appear in linux-next, are you going to submit it?

Yep, I'll cook up a proper changelog and let Thomas judge if the change is worth.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2018-12-27 23:11                     ` Heiner Kallweit
  2018-12-28  1:31                       ` Frederic Weisbecker
@ 2019-01-16  6:24                       ` Frederic Weisbecker
  2019-01-16 18:42                         ` Heiner Kallweit
  2019-01-24 19:37                         ` Heiner Kallweit
  1 sibling, 2 replies; 29+ messages in thread
From: Frederic Weisbecker @ 2019-01-16  6:24 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> 
> # tracer: nop
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>           <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
>          cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
>           <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
>           <idle>-0     [001] d..1  1479.113077: <stack trace>
>  => can_stop_idle_tick.isra.14
>  => tick_nohz_get_sleep_length
>  => menu_select
>  => cpuidle_select
>  => do_idle
>  => cpu_startup_entry
>  => start_secondary
>  => secondary_startup_64
>           <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
>           <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
>           <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
>           <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]


So, I'm being very doubtful now actually.

First of all, I realize my patch only fixes things halfway. Only softirq raised from
hardirqs are going to be handled after ksoftirqd is parked. If it's raised outside
interrupts, the softirq is going to be ignored. If other smpboot kthreads, following
ksoftirqd parking, raise softirqs from their park() callback we should at least warn
that we are ignoring that.

Also, I'm still not sure if you're running threaded IRQs. Very likely not as
the softirqs in your traces are executed in idle, which means from an IRQ interrupting
idle. You haven't passed "threadirqs" in your kernel parameters, right?
In any case we need to figure out why your softirqs raised at 1479.112930 and
1479.112935 don't get handled.

Finally I don't understand why you have so many timer softirqs and yet no timer traces
at all, I turned them on in the script I gave you. So something weird is happening with
the timers code. At best we are wasting lots of CPU cycles with those empty softirqs, at
worst we are miscalculating some expiration somewhere, possibly ignoring a timer.

So I'll need to cook up an updated debug patch for you to trigger if you don't
mind.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-01-16  6:24                       ` Frederic Weisbecker
@ 2019-01-16 18:42                         ` Heiner Kallweit
  2019-01-24 19:37                         ` Heiner Kallweit
  1 sibling, 0 replies; 29+ messages in thread
From: Heiner Kallweit @ 2019-01-16 18:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On 16.01.2019 07:24, Frederic Weisbecker wrote:
> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>
>> # tracer: nop
>> #
>> #                              _-----=> irqs-off
>> #                             / _----=> need-resched
>> #                            | / _---=> hardirq/softirq
>> #                            || / _--=> preempt-depth
>> #                            ||| /     delay
>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> #              | |       |   ||||       |         |
>>           <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
>>          cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
>>           <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d..1  1479.113077: <stack trace>
>>  => can_stop_idle_tick.isra.14
>>  => tick_nohz_get_sleep_length
>>  => menu_select
>>  => cpuidle_select
>>  => do_idle
>>  => cpu_startup_entry
>>  => start_secondary
>>  => secondary_startup_64
>>           <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]
> 
> 
> So, I'm being very doubtful now actually.
> 
> First of all, I realize my patch only fixes things halfway. Only softirq raised from
> hardirqs are going to be handled after ksoftirqd is parked. If it's raised outside
> interrupts, the softirq is going to be ignored. If other smpboot kthreads, following
> ksoftirqd parking, raise softirqs from their park() callback we should at least warn
> that we are ignoring that.
> 
> Also, I'm still not sure if you're running threaded IRQs. Very likely not as
> the softirqs in your traces are executed in idle, which means from an IRQ interrupting
> idle. You haven't passed "threadirqs" in your kernel parameters, right?
> In any case we need to figure out why your softirqs raised at 1479.112930 and
> 1479.112935 don't get handled.
> 
The only softirq I'm aware of is network NAPI processing. And no, I have no kernel
command line parameter "threadirqs" set.

> Finally I don't understand why you have so many timer softirqs and yet no timer traces
> at all, I turned them on in the script I gave you. So something weird is happening with
> the timers code. At best we are wasting lots of CPU cycles with those empty softirqs, at
> worst we are miscalculating some expiration somewhere, possibly ignoring a timer.
> 
> So I'll need to cook up an updated debug patch for you to trigger if you don't
> mind.
> 
I don't mind ..

> Thanks.
> 
Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-01-16  6:24                       ` Frederic Weisbecker
  2019-01-16 18:42                         ` Heiner Kallweit
@ 2019-01-24 19:37                         ` Heiner Kallweit
  2019-02-14 19:05                           ` Heiner Kallweit
  1 sibling, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2019-01-24 19:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On 16.01.2019 07:24, Frederic Weisbecker wrote:
> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>
>> # tracer: nop
>> #
>> #                              _-----=> irqs-off
>> #                             / _----=> need-resched
>> #                            | / _---=> hardirq/softirq
>> #                            || / _--=> preempt-depth
>> #                            ||| /     delay
>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> #              | |       |   ||||       |         |
>>           <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
>>          cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
>>           <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
>>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
>>           <idle>-0     [001] d..1  1479.113077: <stack trace>
>>  => can_stop_idle_tick.isra.14
>>  => tick_nohz_get_sleep_length
>>  => menu_select
>>  => cpuidle_select
>>  => do_idle
>>  => cpu_startup_entry
>>  => start_secondary
>>  => secondary_startup_64
>>           <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
>>           <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
>>           <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
>>           <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]
> 
> 
> So, I'm being very doubtful now actually.
> 
> First of all, I realize my patch only fixes things halfway. Only softirq raised from
> hardirqs are going to be handled after ksoftirqd is parked. If it's raised outside
> interrupts, the softirq is going to be ignored. If other smpboot kthreads, following
> ksoftirqd parking, raise softirqs from their park() callback we should at least warn
> that we are ignoring that.
> 
> Also, I'm still not sure if you're running threaded IRQs. Very likely not as
> the softirqs in your traces are executed in idle, which means from an IRQ interrupting
> idle. You haven't passed "threadirqs" in your kernel parameters, right?
> In any case we need to figure out why your softirqs raised at 1479.112930 and
> 1479.112935 don't get handled.
> 
> Finally I don't understand why you have so many timer softirqs and yet no timer traces
> at all, I turned them on in the script I gave you. So something weird is happening with
> the timers code. At best we are wasting lots of CPU cycles with those empty softirqs, at
> worst we are miscalculating some expiration somewhere, possibly ignoring a timer.
> 
> So I'll need to cook up an updated debug patch for you to trigger if you don't
> mind.
> 
Hi Frederic,
is the update debug patch still to come?

> Thanks.
> 
Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-01-24 19:37                         ` Heiner Kallweit
@ 2019-02-14 19:05                           ` Heiner Kallweit
  2019-02-14 21:47                             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2019-02-14 19:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On 24.01.2019 20:37, Heiner Kallweit wrote:
> On 16.01.2019 07:24, Frederic Weisbecker wrote:
>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>>
>>> # tracer: nop
>>> #
>>> #                              _-----=> irqs-off
>>> #                             / _----=> need-resched
>>> #                            | / _---=> hardirq/softirq
>>> #                            || / _--=> preempt-depth
>>> #                            ||| /     delay
>>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>> #              | |       |   ||||       |         |
>>>           <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
>>>          cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
>>>           <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
>>>           <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
>>>           <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
>>>          cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
>>>          cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
>>>           <idle>-0     [001] d..1  1479.113077: <stack trace>
>>>  => can_stop_idle_tick.isra.14
>>>  => tick_nohz_get_sleep_length
>>>  => menu_select
>>>  => cpuidle_select
>>>  => do_idle
>>>  => cpu_startup_entry
>>>  => start_secondary
>>>  => secondary_startup_64
>>>           <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
>>>           <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
>>>           <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
>>>           <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]
>>
>>
>> So, I'm being very doubtful now actually.
>>
>> First of all, I realize my patch only fixes things halfway. Only softirq raised from
>> hardirqs are going to be handled after ksoftirqd is parked. If it's raised outside
>> interrupts, the softirq is going to be ignored. If other smpboot kthreads, following
>> ksoftirqd parking, raise softirqs from their park() callback we should at least warn
>> that we are ignoring that.
>>
>> Also, I'm still not sure if you're running threaded IRQs. Very likely not as
>> the softirqs in your traces are executed in idle, which means from an IRQ interrupting
>> idle. You haven't passed "threadirqs" in your kernel parameters, right?
>> In any case we need to figure out why your softirqs raised at 1479.112930 and
>> 1479.112935 don't get handled.
>>
>> Finally I don't understand why you have so many timer softirqs and yet no timer traces
>> at all, I turned them on in the script I gave you. So something weird is happening with
>> the timers code. At best we are wasting lots of CPU cycles with those empty softirqs, at
>> worst we are miscalculating some expiration somewhere, possibly ignoring a timer.
>>
>> So I'll need to cook up an updated debug patch for you to trigger if you don't
>> mind.
>>
> Hi Frederic,
> is the update debug patch still to come?
> 
>> Thanks.
>>
> Heiner
> 
Hi Frederic,

before it's forgotten: If I can test more, just let me know.

Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-02-14 19:05                           ` Heiner Kallweit
@ 2019-02-14 21:47                             ` Thomas Gleixner
  2019-02-14 22:33                               ` Heiner Kallweit
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2019-02-14 21:47 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Frederic Weisbecker, Anna-Maria Gleixner, Linux Kernel Mailing List

On Thu, 14 Feb 2019, Heiner Kallweit wrote:
> On 24.01.2019 20:37, Heiner Kallweit wrote:
> >>
> > Hi Frederic,
> > is the update debug patch still to come?

Heiner, there is a related fix queued in

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core

See also:
    https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
    https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org

Can you please have a look?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-02-14 21:47                             ` Thomas Gleixner
@ 2019-02-14 22:33                               ` Heiner Kallweit
  2019-02-15  0:31                                 ` Frederic Weisbecker
  0 siblings, 1 reply; 29+ messages in thread
From: Heiner Kallweit @ 2019-02-14 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Anna-Maria Gleixner, Linux Kernel Mailing List

On 14.02.2019 22:47, Thomas Gleixner wrote:
> On Thu, 14 Feb 2019, Heiner Kallweit wrote:
>> On 24.01.2019 20:37, Heiner Kallweit wrote:
>>>>
>>> Hi Frederic,
>>> is the update debug patch still to come?
> 
> Heiner, there is a related fix queued in
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
> 
> See also:
>     https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
>     https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
> 
> Can you please have a look?
> 
Great. And sure, I can test it over the weekend.

> Thanks,
> 
> 	tglx
> 
Rgds, Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-02-14 22:33                               ` Heiner Kallweit
@ 2019-02-15  0:31                                 ` Frederic Weisbecker
  2019-02-16  9:14                                   ` Heiner Kallweit
  0 siblings, 1 reply; 29+ messages in thread
From: Frederic Weisbecker @ 2019-02-15  0:31 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On Thu, Feb 14, 2019 at 11:33:44PM +0100, Heiner Kallweit wrote:
> On 14.02.2019 22:47, Thomas Gleixner wrote:
> > On Thu, 14 Feb 2019, Heiner Kallweit wrote:
> >> On 24.01.2019 20:37, Heiner Kallweit wrote:
> >>>>
> >>> Hi Frederic,
> >>> is the update debug patch still to come?
> > 
> > Heiner, there is a related fix queued in
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
> > 
> > See also:
> >     https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
> >     https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
> > 
> > Can you please have a look?
> > 
> Great. And sure, I can test it over the weekend.

Thanks you, and sorry for the shutdown, I was buried into softirq code lately.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem
  2019-02-15  0:31                                 ` Frederic Weisbecker
@ 2019-02-16  9:14                                   ` Heiner Kallweit
  0 siblings, 0 replies; 29+ messages in thread
From: Heiner Kallweit @ 2019-02-16  9:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Anna-Maria Gleixner, Linux Kernel Mailing List

On 15.02.2019 01:31, Frederic Weisbecker wrote:
> On Thu, Feb 14, 2019 at 11:33:44PM +0100, Heiner Kallweit wrote:
>> On 14.02.2019 22:47, Thomas Gleixner wrote:
>>> On Thu, 14 Feb 2019, Heiner Kallweit wrote:
>>>> On 24.01.2019 20:37, Heiner Kallweit wrote:
>>>>>>
>>>>> Hi Frederic,
>>>>> is the update debug patch still to come?
>>>
>>> Heiner, there is a related fix queued in
>>>
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
>>>
>>> See also:
>>>     https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
>>>     https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
>>>
>>> Can you please have a look?
>>>
>> Great. And sure, I can test it over the weekend.
> 
> Thanks you, and sorry for the shutdown, I was buried into softirq code lately.
> 
No worries, appreciate your efforts. I tested and wasn't able to reproduce
the issue. Looks good ..

Thanks, Heiner

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-02-16  9:14 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-16  6:13 Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem Heiner Kallweit
2018-08-18 11:26 ` Thomas Gleixner
2018-08-18 22:34   ` Heiner Kallweit
2018-08-24  4:12 ` Frederic Weisbecker
2018-08-24  5:59   ` Heiner Kallweit
2018-08-24  8:01     ` Thomas Gleixner
2018-08-24 14:30       ` Frederic Weisbecker
2018-08-24 17:06         ` Heiner Kallweit
2018-08-28  2:25           ` Frederic Weisbecker
2018-09-27 16:05             ` Thomas Gleixner
2018-09-28 13:18               ` Frederic Weisbecker
2018-09-28 20:35                 ` Heiner Kallweit
2018-10-15 20:58                 ` Heiner Kallweit
2018-12-24 21:11                   ` Heiner Kallweit
2018-12-27  6:53                   ` Frederic Weisbecker
2018-12-27 23:11                     ` Heiner Kallweit
2018-12-28  1:31                       ` Frederic Weisbecker
2018-12-28  6:34                         ` Heiner Kallweit
2018-12-28  6:39                           ` Heiner Kallweit
2019-01-09 22:20                             ` Heiner Kallweit
2019-01-11 21:36                               ` Frederic Weisbecker
2019-01-16  6:24                       ` Frederic Weisbecker
2019-01-16 18:42                         ` Heiner Kallweit
2019-01-24 19:37                         ` Heiner Kallweit
2019-02-14 19:05                           ` Heiner Kallweit
2019-02-14 21:47                             ` Thomas Gleixner
2019-02-14 22:33                               ` Heiner Kallweit
2019-02-15  0:31                                 ` Frederic Weisbecker
2019-02-16  9:14                                   ` Heiner Kallweit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).