linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
@ 2013-05-09 12:29 Jiri Kosina
  2013-05-09 12:50 ` Borislav Petkov
  2013-05-10  0:29 ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Frederic Weisbecker
  0 siblings, 2 replies; 54+ messages in thread
From: Jiri Kosina @ 2013-05-09 12:29 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Tony Luck, Borislav Petkov, linux-kernel, x86

Hi,

I just got the warning below when resuming from hibernation with kernel 
that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.


[ ... snip ... ]
 PM: Hibernation mode set to 'shutdown'
 PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
 PM: Marking nosave pages: [mem 0x7c4a1000-0x7c4a6fff]
 PM: Marking nosave pages: [mem 0x7c5b7000-0x7c60efff]
 PM: Marking nosave pages: [mem 0x7c6c6000-0x7c9fefff]
 PM: Basic memory bitmaps created
 PM: Syncing filesystems ... done.
 Freezing user space processes ... (elapsed 0.01 seconds) done.
 PM: Preallocating image memory... done (allocated 194411 pages)
 PM: Allocated 777644 kbytes in 0.29 seconds (2681.53 MB/s)
 Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
 Suspending console(s) (use no_console_suspend to debug)
 mei_me 0000:00:03.0: suspend
 mei_me 0000:00:03.0: wait hw ready failed. status = 0x0
 PM: freeze of devices complete after 406.995 msecs
 PM: late freeze of devices complete after 0.543 msecs
 PM: noirq freeze of devices complete after 1.379 msecs
 Disabling non-boot CPUs ...
 smpboot: CPU 1 is now offline
 PM: Creating hibernation image:
 PM: Need to copy 193377 pages
 PM: Normal pages needed: 193377 + 1024, available pages: 316026
 microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60f
 Enabling non-boot CPUs ...
 smpboot: Booting Node 0 Processor 1 APIC 0x1
 CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
 Disabled fast string operations
 ------------[ cut here ]------------
 WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x59/0x60()
 Modules linked in: af_packet tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant snd_hda_intel snd_hda_codec acpi_cpufreq mperf snd_hwdep kvm_intel kvm snd_pcm iwldvm thinkpad_acpi mac80211 snd_seq sg microcode iwlwifi snd_timer snd_seq_device cfg80211 i2c_i801 pcspkr rfkill lpc_ich e1000e mfd_core snd mei_me mei ptp snd_page_alloc ehci_pci pps_core wmi soundcore ac battery tpm_tis tpm tpm_bios autofs4 uhci_hcd i915 ehci_hcd drm_kms_helper drm i2c_algo_bit video usbcore button usb_common edd fan processor ata_generic thermal thermal_sys
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-11789-ge0fd9af #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  000000000000007b ffff880079851da8 ffffffff81547dde ffff880079851de8
  ffffffff8104212b ffff880079851dd8 0000000000000001 ffff88007c28cca0
  ffff880079878000 000000010003fb57 0000000000000096 ffff880079851df8
 Call Trace:
  [<ffffffff81547dde>] dump_stack+0x19/0x1b
  [<ffffffff8104212b>] warn_slowpath_common+0x6b/0xa0
  [<ffffffff81042175>] warn_slowpath_null+0x15/0x20
  [<ffffffff81026b09>] native_smp_send_reschedule+0x59/0x60
  [<ffffffff810774dc>] wake_up_nohz_cpu+0x9c/0xa0
  [<ffffffff810529db>] add_timer_on+0xdb/0x110
  [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
  [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
  [<ffffffff8153dc8e>] mcheck_cpu_init+0x6f/0x111
  [<ffffffff8153b3ae>] identify_cpu+0x3cc/0x3f9
  [<ffffffff8153b3ed>] identify_secondary_cpu+0x12/0x1d
  [<ffffffff8153f836>] smp_store_cpu_info+0x3a/0x3c
  [<ffffffff8153f922>] smp_callin+0xea/0x1c1
  [<ffffffff8153fa1d>] start_secondary+0x24/0x97
 ---[ end trace 2c2f8914d9cd6bb0 ]---
 microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60f
 CPU1 is up
 i915 0000:00:02.0: power state changed by ACPI to D0
[ ... snip ... ]

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-09 12:29 NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Jiri Kosina
@ 2013-05-09 12:50 ` Borislav Petkov
  2013-05-09 12:58   ` Borislav Petkov
  2013-05-10  0:29 ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Frederic Weisbecker
  1 sibling, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-09 12:50 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote:
> Hi,
> 
> I just got the warning below when resuming from hibernation with kernel 
> that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.

Did you boot with any of the NO_HZ_FULL options on the command line,
i.e. rcu_nocbs?

Because I saw the same issue during boot when I was testing the
NO_HZ_FULL stuff previously:

Feb 21 11:13:17 gondor kernel: [    0.093417]  #3
Feb 21 11:13:17 gondor kernel: [    0.106844] SMP alternatives: lockdep: fixing up alternatives
Feb 21 11:13:17 gondor kernel: [    0.118060] ------------[ cut here ]------------
Feb 21 11:13:17 gondor kernel: [    0.118069] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
Feb 21 11:13:17 gondor kernel: [    0.118069] Hardware name: Precision T3600
Feb 21 11:13:17 gondor kernel: [    0.106955]  #4
Feb 21 11:13:17 gondor kernel: [    0.118070] Modules linked in:
Feb 21 11:13:17 gondor kernel: [    0.118073] Pid: 0, comm: swapper/4 Not tainted 3.8.0-rc6+ #4
Feb 21 11:13:17 gondor kernel: [    0.118074] Call Trace:
Feb 21 11:13:17 gondor kernel: [    0.118076]  [<ffffffff8103d35f>] warn_slowpath_common+0x7f/0xc0
Feb 21 11:13:17 gondor kernel: [    0.118079]  [<ffffffff8103d3ba>] warn_slowpath_null+0x1a/0x20
Feb 21 11:13:17 gondor kernel: [    0.118081]  [<ffffffff81024a58>] native_smp_send_reschedule+0x58/0x60
Feb 21 11:13:17 gondor kernel: [    0.118083]  [<ffffffff81071ba0>] wake_up_nohz_cpu+0x80/0x90
Feb 21 11:13:17 gondor kernel: [    0.118086]  [<ffffffff8104e611>] add_timer_on+0x91/0x110
Feb 21 11:13:17 gondor kernel: [    0.118090]  [<ffffffff8101ccab>] mce_start_timer.isra.12+0x6b/0x80
Feb 21 11:13:17 gondor kernel: [    0.118092]  [<ffffffff8101cd1d>] __mcheck_cpu_init_timer+0x5d/0x70
Feb 21 11:13:17 gondor kernel: [    0.118094]  [<ffffffff815cc466>] mcheck_cpu_init+0x36b/0x400
Feb 21 11:13:17 gondor kernel: [    0.118097]  [<ffffffff815ca15b>] identify_cpu+0x39d/0x3d0
Feb 21 11:13:17 gondor kernel: [    0.118099]  [<ffffffff815ca1a2>] identify_secondary_cpu+0x14/0x1b
Feb 21 11:13:17 gondor kernel: [    0.118101]  [<ffffffff815cd008>] smp_store_cpu_info+0x38/0x3a
Feb 21 11:13:17 gondor kernel: [    0.118103]  [<ffffffff815cd550>] start_secondary+0xfb/0x1f7
Feb 21 11:13:17 gondor kernel: [    0.118121] ---[ end trace 8ffeaaf1f7048154 ]---

And I *think* I saw this yesterday too on a randbuild kernel.

Looks like we're sending a resched IPI to a cpu which is not online yet
in order to start the MCE polling timer. So the rcu* options are kinda
unlikely to be related, AFAICT.

Frederic?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-09 12:50 ` Borislav Petkov
@ 2013-05-09 12:58   ` Borislav Petkov
  2013-05-15 18:45     ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-09 12:58 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote:
> Looks like we're sending a resched IPI to a cpu which is not online
> yet in order to start the MCE polling timer. So the rcu* options are
> kinda unlikely to be related, AFAICT.

On a second thought, they must be somehow indirectly related because I
had "rcu_nocbs=4-7" on the command line and have the warning on the same
CPUs: 4 - 7.

Actually, NO_HZ_FULL_ALL makes all CPUs full dynticks so I'd guess the
RCU callback offloading happens on all of them, thus the warning can
happen on every CPU which is in full dynticks mode but not online yet.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-09 12:29 NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Jiri Kosina
  2013-05-09 12:50 ` Borislav Petkov
@ 2013-05-10  0:29 ` Frederic Weisbecker
  2013-05-10  9:28   ` Borislav Petkov
  2013-05-10 15:03   ` Jiri Kosina
  1 sibling, 2 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-10  0:29 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Tony Luck, Borislav Petkov, linux-kernel, x86

On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote:
> Hi,
> 
> I just got the warning below when resuming from hibernation with kernel 
> that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.
> 
> 
> [ ... snip ... ]
>  PM: Hibernation mode set to 'shutdown'
>  PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
>  PM: Marking nosave pages: [mem 0x7c4a1000-0x7c4a6fff]
>  PM: Marking nosave pages: [mem 0x7c5b7000-0x7c60efff]
>  PM: Marking nosave pages: [mem 0x7c6c6000-0x7c9fefff]
>  PM: Basic memory bitmaps created
>  PM: Syncing filesystems ... done.
>  Freezing user space processes ... (elapsed 0.01 seconds) done.
>  PM: Preallocating image memory... done (allocated 194411 pages)
>  PM: Allocated 777644 kbytes in 0.29 seconds (2681.53 MB/s)
>  Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
>  Suspending console(s) (use no_console_suspend to debug)
>  mei_me 0000:00:03.0: suspend
>  mei_me 0000:00:03.0: wait hw ready failed. status = 0x0
>  PM: freeze of devices complete after 406.995 msecs
>  PM: late freeze of devices complete after 0.543 msecs
>  PM: noirq freeze of devices complete after 1.379 msecs
>  Disabling non-boot CPUs ...
>  smpboot: CPU 1 is now offline
>  PM: Creating hibernation image:
>  PM: Need to copy 193377 pages
>  PM: Normal pages needed: 193377 + 1024, available pages: 316026
>  microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60f
>  Enabling non-boot CPUs ...
>  smpboot: Booting Node 0 Processor 1 APIC 0x1
>  CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
>  Disabled fast string operations
>  ------------[ cut here ]------------
>  WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x59/0x60()
>  Modules linked in: af_packet tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant snd_hda_intel snd_hda_codec acpi_cpufreq mperf snd_hwdep kvm_intel kvm snd_pcm iwldvm thinkpad_acpi mac80211 snd_seq sg microcode iwlwifi snd_timer snd_seq_device cfg80211 i2c_i801 pcspkr rfkill lpc_ich e1000e mfd_core snd mei_me mei ptp snd_page_alloc ehci_pci pps_core wmi soundcore ac battery tpm_tis tpm tpm_bios autofs4 uhci_hcd i915 ehci_hcd drm_kms_helper drm i2c_algo_bit video usbcore button usb_common edd fan processor ata_generic thermal thermal_sys
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-11789-ge0fd9af #1
>  Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
>   000000000000007b ffff880079851da8 ffffffff81547dde ffff880079851de8
>   ffffffff8104212b ffff880079851dd8 0000000000000001 ffff88007c28cca0
>   ffff880079878000 000000010003fb57 0000000000000096 ffff880079851df8
>  Call Trace:
>   [<ffffffff81547dde>] dump_stack+0x19/0x1b
>   [<ffffffff8104212b>] warn_slowpath_common+0x6b/0xa0
>   [<ffffffff81042175>] warn_slowpath_null+0x15/0x20
>   [<ffffffff81026b09>] native_smp_send_reschedule+0x59/0x60
>   [<ffffffff810774dc>] wake_up_nohz_cpu+0x9c/0xa0
>   [<ffffffff810529db>] add_timer_on+0xdb/0x110
>   [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
>   [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
>   [<ffffffff8153dc8e>] mcheck_cpu_init+0x6f/0x111
>   [<ffffffff8153b3ae>] identify_cpu+0x3cc/0x3f9
>   [<ffffffff8153b3ed>] identify_secondary_cpu+0x12/0x1d
>   [<ffffffff8153f836>] smp_store_cpu_info+0x3a/0x3c
>   [<ffffffff8153f922>] smp_callin+0xea/0x1c1
>   [<ffffffff8153fa1d>] start_secondary+0x24/0x97

Like Borislav said, it's due to the scheduler IPI sent to an offline
target. Here this is because we enqueue a timer and we must ensure the
target handles this timer by rescheduling its tick if necessary.

But it's weird because the mce timer at this stage should only enqueue
to the current CPU and the tick shouldn't be stopped. So there shouldn't
be an IPI sent.

Could you please apply this patch and tell me what you can see in the
logs?

Thanks.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..19e841a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
 {
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
-		    tick_nohz_tick_stopped())
+		    tick_nohz_tick_stopped()) {
+			if (!cpu_online(cpu)) {
+				static int printed = 0;
+				if (!printed) {
+					printk("%d %d\n", cpu, smp_processor_id());
+					dump_stack();
+					printed = 1;
+				}
+			}
 			smp_send_reschedule(cpu);
+		}
 		return true;
 	}
 


>  ---[ end trace 2c2f8914d9cd6bb0 ]---
>  microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60f
>  CPU1 is up
>  i915 0000:00:02.0: power state changed by ACPI to D0
> [ ... snip ... ]
> 
> -- 
> Jiri Kosina
> SUSE Labs

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:28   ` Borislav Petkov
@ 2013-05-10  9:26     ` Frederic Weisbecker
  2013-05-10  9:37       ` Ingo Molnar
                         ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-10  9:26 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

2013/5/10 Borislav Petkov <bp@alien8.de>:
> On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
>> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
>>  {
>>       if (tick_nohz_full_cpu(cpu)) {
>>               if (cpu != smp_processor_id() ||
>> -                 tick_nohz_tick_stopped())
>> +                 tick_nohz_tick_stopped()) {
>> +                     if (!cpu_online(cpu)) {
>> +                             static int printed = 0;
>> +                             if (!printed) {
>> +                                     printk("%d %d\n", cpu, smp_processor_id());
>> +                                     dump_stack();
>> +                                     printed = 1;
>
> You know about printk_once, right? We even have all the pr_<level>_once
> macros.
>
> :-)

But but that didn't fit as I also want the dump_stack(). May be we
want dump_stack_once().

In fact it would be nice to have DO_ONCE(something) and stuff whatever
we want inside.
All the printk_once() et. al could even be implemented using that.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  0:29 ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Frederic Weisbecker
@ 2013-05-10  9:28   ` Borislav Petkov
  2013-05-10  9:26     ` Frederic Weisbecker
  2013-05-10 15:03   ` Jiri Kosina
  1 sibling, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-10  9:28 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  {
>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
> -		    tick_nohz_tick_stopped())
> +		    tick_nohz_tick_stopped()) {
> +			if (!cpu_online(cpu)) {
> +				static int printed = 0;
> +				if (!printed) {
> +					printk("%d %d\n", cpu, smp_processor_id());
> +					dump_stack();
> +					printed = 1;

You know about printk_once, right? We even have all the pr_<level>_once
macros.

:-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:26     ` Frederic Weisbecker
@ 2013-05-10  9:37       ` Ingo Molnar
  2013-05-10  9:45         ` Borislav Petkov
  2013-05-10  9:43       ` Borislav Petkov
  2013-05-10 15:42       ` Jiri Kosina
  2 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2013-05-10  9:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Borislav Petkov, Jiri Kosina, Tony Luck, linux-kernel, x86


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> 2013/5/10 Borislav Petkov <bp@alien8.de>:
> > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >>  {
> >>       if (tick_nohz_full_cpu(cpu)) {
> >>               if (cpu != smp_processor_id() ||
> >> -                 tick_nohz_tick_stopped())
> >> +                 tick_nohz_tick_stopped()) {
> >> +                     if (!cpu_online(cpu)) {
> >> +                             static int printed = 0;
> >> +                             if (!printed) {
> >> +                                     printk("%d %d\n", cpu, smp_processor_id());
> >> +                                     dump_stack();
> >> +                                     printed = 1;
> >
> > You know about printk_once, right? We even have all the pr_<level>_once
> > macros.
> >
> > :-)
> 
> But but that didn't fit as I also want the dump_stack(). May be we
> want dump_stack_once().

The pattern I use in such cases is:

	if (WARN_ONCE(!cpu_online(cpu))) {
		printk("%d %d\n", cpu, smp_processor_id());
		dump_stack();
	}

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:26     ` Frederic Weisbecker
  2013-05-10  9:37       ` Ingo Molnar
@ 2013-05-10  9:43       ` Borislav Petkov
  2013-05-10 15:42       ` Jiri Kosina
  2 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-10  9:43 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 11:26:39AM +0200, Frederic Weisbecker wrote:
> 2013/5/10 Borislav Petkov <bp@alien8.de>:
> > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >>  {
> >>       if (tick_nohz_full_cpu(cpu)) {
> >>               if (cpu != smp_processor_id() ||
> >> -                 tick_nohz_tick_stopped())
> >> +                 tick_nohz_tick_stopped()) {
> >> +                     if (!cpu_online(cpu)) {
> >> +                             static int printed = 0;
> >> +                             if (!printed) {
> >> +                                     printk("%d %d\n", cpu, smp_processor_id());
> >> +                                     dump_stack();
> >> +                                     printed = 1;
> >
> > You know about printk_once, right? We even have all the pr_<level>_once
> > macros.
> >
> > :-)
> 
> But but that didn't fit as I also want the dump_stack().

Yeah, true - that would flood otherwise.

/me needs to definitely have coffee before reading mail.

> May be we want dump_stack_once().
>
> In fact it would be nice to have DO_ONCE(something) and stuff whatever
> we want inside. All the printk_once() et. al could even be implemented
> using that.

Frankly, I have needed functionality like that in the past too. Let's
see whose itch needs a scratching earlier :-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:37       ` Ingo Molnar
@ 2013-05-10  9:45         ` Borislav Petkov
  2013-05-10 15:31           ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-10  9:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote:
> The pattern I use in such cases is:
> 
> 	if (WARN_ONCE(!cpu_online(cpu))) {
> 		printk("%d %d\n", cpu, smp_processor_id());
> 		dump_stack();
> 	}

Cool, and WARN_ONCE dumps stack already so:

	if (WARN_ONCE(!cpu_online(cpu)))
		printk("%d %d\n", cpu, smp_processor_id());

Nice.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  0:29 ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Frederic Weisbecker
  2013-05-10  9:28   ` Borislav Petkov
@ 2013-05-10 15:03   ` Jiri Kosina
  2013-05-10 15:21     ` Borislav Petkov
  1 sibling, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2013-05-10 15:03 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Tony Luck, Borislav Petkov, linux-kernel, x86

On Fri, 10 May 2013, Frederic Weisbecker wrote:

> Like Borislav said, it's due to the scheduler IPI sent to an offline
> target. Here this is because we enqueue a timer and we must ensure the
> target handles this timer by rescheduling its tick if necessary.
> 
> But it's weird because the mce timer at this stage should only enqueue
> to the current CPU and the tick shouldn't be stopped. So there shouldn't
> be an IPI sent.
> 
> Could you please apply this patch and tell me what you can see in the
> logs?
> 
> Thanks.
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 58453b8..19e841a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  {
>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
> -		    tick_nohz_tick_stopped())
> +		    tick_nohz_tick_stopped()) {
> +			if (!cpu_online(cpu)) {
> +				static int printed = 0;
> +				if (!printed) {
> +					printk("%d %d\n", cpu, smp_processor_id());
> +					dump_stack();
> +					printed = 1;
> +				}
> +			}
>  			smp_send_reschedule(cpu);
> +		}
>  		return true;
>  	}

Absolutely, here it goes.

[ ... snip ... ]
 Enabling non-boot CPUs ...
 smpboot: Booting Node 0 Processor 1 APIC 0x1
 CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
 Disabled fast string operations
 1 1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-12317-gb2031d4 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffff88007c28cca0 ffff880079851e08 ffffffff8154837e ffff880079851e28
  ffffffff81077514 ffff88007c28cca0 ffff88007c28cca0 ffff880079851e68
  ffffffff810529db 0000000179851e78 ffff88007c28cca0 0000000000000001
 Call Trace:
  [<ffffffff8154837e>] dump_stack+0x19/0x1b
  [<ffffffff81077514>] wake_up_nohz_cpu+0xd4/0xf0
  [<ffffffff810529db>] add_timer_on+0xdb/0x110
  [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
  [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
  [<ffffffff8153e22e>] mcheck_cpu_init+0x6f/0x111
  [<ffffffff8153b94e>] identify_cpu+0x3cc/0x3f9
  [<ffffffff8153b98d>] identify_secondary_cpu+0x12/0x1d
  [<ffffffff8153fdd6>] smp_store_cpu_info+0x3a/0x3c
  [<ffffffff8153fec2>] smp_callin+0xea/0x1c1
  [<ffffffff8153ffbd>] start_secondary+0x24/0x97
 ------------[ cut here ]------------
 WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x59/0x60()
 Modules linked in: af_packet tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables rfcomm bnep btusb bluetooth cpufreq_conservative cpufreq_userspace cpufreq_powersave iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant acpi_cpufreq mperf kvm_intel kvm iwldvm mac80211 thinkpad_acpi snd_hda_intel sg microcode snd_hda_codec snd_seq iwlwifi snd_hwdep cfg80211 snd_seq_device pcspkr i2c_i801 snd_pcm lpc_ich mfd_core rfkill e1000e snd_timer snd_page_alloc ehci_pci ptp mei_me pps_core mei snd wmi soundcore tpm_tis battery tpm ac tpm_bios autofs4 uhci_hcd ehci_hcd i915 drm_kms_helper drm i2c_algo_bit usbcore button video usb_common edd fan processor ata_generic thermal thermal_sys
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-12317-gb2031d4 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  000000000000007b ffff880079851da8 ffffffff8154837e ffff880079851de8
  ffffffff8104212b ffff88007c28cca0 0000000000000001 ffff88007c28cca0
  ffff880079878000 0000000100004525 0000000000000096 ffff880079851df8
 Call Trace:
  [<ffffffff8154837e>] dump_stack+0x19/0x1b
  [<ffffffff8104212b>] warn_slowpath_common+0x6b/0xa0
  [<ffffffff81042175>] warn_slowpath_null+0x15/0x20
  [<ffffffff81026b09>] native_smp_send_reschedule+0x59/0x60
  [<ffffffff81077486>] wake_up_nohz_cpu+0x46/0xf0
  [<ffffffff810529db>] add_timer_on+0xdb/0x110
  [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
  [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
  [<ffffffff8153e22e>] mcheck_cpu_init+0x6f/0x111
  [<ffffffff8153b94e>] identify_cpu+0x3cc/0x3f9
  [<ffffffff8153b98d>] identify_secondary_cpu+0x12/0x1d
  [<ffffffff8153fdd6>] smp_store_cpu_info+0x3a/0x3c
  [<ffffffff8153fec2>] smp_callin+0xea/0x1c1
  [<ffffffff8153ffbd>] start_secondary+0x24/0x97
 ---[ end trace 954b959ede48c006 ]---
 microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60f
 CPU1 is up
 i915 0000:00:02.0: power state changed by ACPI to D0
[ ... snip ... ]

I.e. the "CPU1 is up" happens only after the microcode has been updated, 
but the attempt to send IPI to it indeed seems to happen earlier.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10 15:03   ` Jiri Kosina
@ 2013-05-10 15:21     ` Borislav Petkov
  2013-05-10 15:43       ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-10 15:21 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote:
> [ ... snip ... ]
>  Enabling non-boot CPUs ...
>  smpboot: Booting Node 0 Processor 1 APIC 0x1
>  CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
>  Disabled fast string operations
>  1 1
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-12317-gb2031d4 #1
>  Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
>   ffff88007c28cca0 ffff880079851e08 ffffffff8154837e ffff880079851e28
>   ffffffff81077514 ffff88007c28cca0 ffff88007c28cca0 ffff880079851e68
>   ffffffff810529db 0000000179851e78 ffff88007c28cca0 0000000000000001
>  Call Trace:
>   [<ffffffff8154837e>] dump_stack+0x19/0x1b
>   [<ffffffff81077514>] wake_up_nohz_cpu+0xd4/0xf0
>   [<ffffffff810529db>] add_timer_on+0xdb/0x110
>   [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
>   [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
>   [<ffffffff8153e22e>] mcheck_cpu_init+0x6f/0x111
>   [<ffffffff8153b94e>] identify_cpu+0x3cc/0x3f9
>   [<ffffffff8153b98d>] identify_secondary_cpu+0x12/0x1d
>   [<ffffffff8153fdd6>] smp_store_cpu_info+0x3a/0x3c
>   [<ffffffff8153fec2>] smp_callin+0xea/0x1c1
>   [<ffffffff8153ffbd>] start_secondary+0x24/0x97

Ok, I got it:

smp_callin is called by start_secondary() and down that path we add the
timer and do wake_up_nohz_cpu.

HOWEVER(!), the bit in the cpu_online_mask is set much later in
smp_callin() with

	set_cpu_online(smp_processor_id(), true);

Thus, when we come to send the IPI, the cpu is still offline, according
to the cpu_online_mask, thus the WARN_ON.

Nice :-\

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:45         ` Borislav Petkov
@ 2013-05-10 15:31           ` Frederic Weisbecker
  0 siblings, 0 replies; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-10 15:31 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ingo Molnar, Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 11:45:56AM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote:
> > The pattern I use in such cases is:
> > 
> > 	if (WARN_ONCE(!cpu_online(cpu))) {
> > 		printk("%d %d\n", cpu, smp_processor_id());
> > 		dump_stack();
> > 	}
> 
> Cool, and WARN_ONCE dumps stack already so:
> 
> 	if (WARN_ONCE(!cpu_online(cpu)))
> 		printk("%d %d\n", cpu, smp_processor_id());

Even better:

 	if (WARN_ONCE(!cpu_online(cpu)))
 		printk_once("%d %d\n", cpu, smp_processor_id());

So the printk doesn't flood after the first case.

> 
> Nice.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10  9:26     ` Frederic Weisbecker
  2013-05-10  9:37       ` Ingo Molnar
  2013-05-10  9:43       ` Borislav Petkov
@ 2013-05-10 15:42       ` Jiri Kosina
  2 siblings, 0 replies; 54+ messages in thread
From: Jiri Kosina @ 2013-05-10 15:42 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Borislav Petkov, Tony Luck, linux-kernel, x86

On Fri, 10 May 2013, Frederic Weisbecker wrote:

> In fact it would be nice to have DO_ONCE(something) and stuff whatever
> we want inside.
> All the printk_once() et. al could even be implemented using that.

Sounds nice, but if it's going to be used for something else than purely 
debugging output, one has to keep in mind that printk_once() as currently 
implemented is actually racy.

I.e. you'd need to have locking (or at least cmpxchg) protecting the 
update of the static variable.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10 15:21     ` Borislav Petkov
@ 2013-05-10 15:43       ` Frederic Weisbecker
  2013-05-10 16:23         ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-10 15:43 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 05:21:02PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote:
> > [ ... snip ... ]
> >  Enabling non-boot CPUs ...
> >  smpboot: Booting Node 0 Processor 1 APIC 0x1
> >  CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
> >  Disabled fast string operations
> >  1 1
> >  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-12317-gb2031d4 #1
> >  Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
> >   ffff88007c28cca0 ffff880079851e08 ffffffff8154837e ffff880079851e28
> >   ffffffff81077514 ffff88007c28cca0 ffff88007c28cca0 ffff880079851e68
> >   ffffffff810529db 0000000179851e78 ffff88007c28cca0 0000000000000001
> >  Call Trace:
> >   [<ffffffff8154837e>] dump_stack+0x19/0x1b
> >   [<ffffffff81077514>] wake_up_nohz_cpu+0xd4/0xf0
> >   [<ffffffff810529db>] add_timer_on+0xdb/0x110
> >   [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
> >   [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
> >   [<ffffffff8153e22e>] mcheck_cpu_init+0x6f/0x111
> >   [<ffffffff8153b94e>] identify_cpu+0x3cc/0x3f9
> >   [<ffffffff8153b98d>] identify_secondary_cpu+0x12/0x1d
> >   [<ffffffff8153fdd6>] smp_store_cpu_info+0x3a/0x3c
> >   [<ffffffff8153fec2>] smp_callin+0xea/0x1c1
> >   [<ffffffff8153ffbd>] start_secondary+0x24/0x97
> 
> Ok, I got it:
> 
> smp_callin is called by start_secondary() and down that path we add the
> timer and do wake_up_nohz_cpu.
> 
> HOWEVER(!), the bit in the cpu_online_mask is set much later in
> smp_callin() with
> 
> 	set_cpu_online(smp_processor_id(), true);
> 
> Thus, when we come to send the IPI, the cpu is still offline, according
> to the cpu_online_mask, thus the WARN_ON.
> 
> Nice :-\

Right. But this is adding a timer locally, from CPU 1 to CPU 1, as indicated in the
trace with the "1 1" line. So the only way for this IPI to be self-sent is if the
tick is stopped locally (cf: wake_up_full_nohz_cpu()).

But the tick is not supposed to be stopped so early in a secondary CPU initialization.
The tick can be stopped only from two places:

1) idle loop, but we haven't yet reached that place. cpu_idle() is called much later
2) interrupt exit, but interrupts are supposed to be disabled at this stage

So either interrupts are spuriously enabled early, or ts->tick_stopped is
not correctly initialized.

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10 15:43       ` Frederic Weisbecker
@ 2013-05-10 16:23         ` Borislav Petkov
  2013-05-10 21:38           ` Frederic Weisbecker
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-10 16:23 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote:
> Right. But this is adding a timer locally, from CPU 1 to CPU 1, as
> indicated in the trace with the "1 1" line. So the only way for
> this IPI to be self-sent is if the tick is stopped locally (cf:
> wake_up_full_nohz_cpu()).
>
> But the tick is not supposed to be stopped so early in a secondary CPU
> initialization. The tick can be stopped only from two places:
>
> 1) idle loop, but we haven't yet reached that place. cpu_idle() is
> called much later 2) interrupt exit, but interrupts are supposed to be
> disabled at this stage

Hmm, I see.

> So either interrupts are spuriously enabled early, or ts->tick_stopped
> is not correctly initialized.

Hmm, it can't be interrupts disabled because add_timer_on() does
spin_lock_irqsave() before calling wake_up_nohz_cpu()... Maybe something
like the below could help check this...

Although AFAICT, we're enabling interrupts much later in
start_secondary, even after we've set the bit in the online mask.

---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9c73b51817e4..1b679b0fa57a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -201,6 +201,8 @@ static void __cpuinit smp_callin(void)
 	 */
 	setup_vector_irq(smp_processor_id());
 
+	WARN_ON(!irqs_disabled());
+
 	/*
 	 * Save our processor parameters. Note: this information
 	 * is needed for clock calibration.


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10 16:23         ` Borislav Petkov
@ 2013-05-10 21:38           ` Frederic Weisbecker
  2013-05-13 14:56             ` Jiri Kosina
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-10 21:38 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Jiri Kosina, Tony Luck, linux-kernel, x86

On Fri, May 10, 2013 at 06:23:40PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote:
> > So either interrupts are spuriously enabled early, or ts->tick_stopped
> > is not correctly initialized.
> 
> Hmm, it can't be interrupts disabled because add_timer_on() does
> spin_lock_irqsave() before calling wake_up_nohz_cpu()... Maybe something
> like the below could help check this...

Right, by the time we call wake_up_nohz_cpu(), irqs can't be enabled, but may
be they were enabled before in start_secondary().

> 
> Although AFAICT, we're enabling interrupts much later in
> start_secondary, even after we've set the bit in the online mask.

Right.

> 
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 9c73b51817e4..1b679b0fa57a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -201,6 +201,8 @@ static void __cpuinit smp_callin(void)
>  	 */
>  	setup_vector_irq(smp_processor_id());
>  
> +	WARN_ON(!irqs_disabled());
> +

The problem is that it doesn't catch issues with irqs that have been enabled
before in start_secondary(), then re-disabled somewhow. Warning on offline CPU from the place 
that disables the tick should catch the issue.

Jiri, could you test the following patch? I also added some code to dump
the value of ts->tick_stopped, in case it's not well initialized or something.

Note this may give you spurious warning when you unplug a CPU or when you shutdown the
system. But it's interesting if it dumps something in the boot.

Thanks!

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..9853125 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
 {
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
-		    tick_nohz_tick_stopped())
+		    tick_nohz_tick_stopped()) {
+			if (!cpu_online(cpu)) {
+				static int printed = 0;
+				if (!printed) {
+					printk("src: %d dst: %d stopped: %d\n", cpu, smp_processor_id(), tick_nohz_tick_stopped());
+					dump_stack();
+					printed = 1;
+				}
+			}
 			smp_send_reschedule(cpu);
+		}
 		return true;
 	}
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bc67d42..abfa8c3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
+			WARN_ON_ONCE(!cpu_online(cpu));
 			trace_tick_stop(1, " ");
 		}
 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-10 21:38           ` Frederic Weisbecker
@ 2013-05-13 14:56             ` Jiri Kosina
  2013-05-13 19:40               ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2013-05-13 14:56 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Borislav Petkov, Tony Luck, linux-kernel, x86

On Fri, 10 May 2013, Frederic Weisbecker wrote:

> The problem is that it doesn't catch issues with irqs that have been enabled
> before in start_secondary(), then re-disabled somewhow. Warning on offline CPU from the place 
> that disables the tick should catch the issue.
> 
> Jiri, could you test the following patch? I also added some code to dump
> the value of ts->tick_stopped, in case it's not well initialized or something.
> 
> Note this may give you spurious warning when you unplug a CPU or when you shutdown the
> system. But it's interesting if it dumps something in the boot.
> 
> Thanks!
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 58453b8..9853125 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  {
>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
> -		    tick_nohz_tick_stopped())
> +		    tick_nohz_tick_stopped()) {
> +			if (!cpu_online(cpu)) {
> +				static int printed = 0;
> +				if (!printed) {
> +					printk("src: %d dst: %d stopped: %d\n", cpu, smp_processor_id(), tick_nohz_tick_stopped());
> +					dump_stack();
> +					printed = 1;
> +				}
> +			}
>  			smp_send_reschedule(cpu);
> +		}
>  		return true;
>  	}
>  
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index bc67d42..abfa8c3 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  
>  			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
>  			ts->tick_stopped = 1;
> +			WARN_ON_ONCE(!cpu_online(cpu));
>  			trace_tick_stop(1, " ");
>  		}

Hi Frederic,

I am not getting anything on boot (and I have never seen any warning on 
boot), but suspend-resume cycle always triggers this.

With the patch above, I am getting:


[ ... snip ... ]
 PM: freeze of devices complete after 419.034 msecs
 PM: late freeze of devices complete after 0.589 msecs
 PM: noirq freeze of devices complete after 1.448 msecs
 Disabling non-boot CPUs ...
 ------------[ cut here ]------------
 WARNING: at kernel/time/tick-sched.c:653 tick_nohz_stop_sched_tick+0x38e/0x3a0()
 Modules linked in: af_packet tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables rfcomm bnep btusb bluetooth iTCO_wdt cpufreq_conservative cpufreq_userspace iTCO_vendor_support cpufreq_powersave acpi_cpufreq mperf kvm_intel kvm snd_hda_codec_conexant snd_hda_intel snd_hda_codec microcode snd_hwdep sg snd_pcm iwldvm thinkpad_acpi mac80211 snd_seq iwlwifi pcspkr i2c_i801 cfg80211 lpc_ich mfd_core snd_timer snd_seq_device rfkill e1000e snd ptp mei_me snd_page_alloc mei pps_core ehci_pci wmi tpm_tis soundcore ac battery tpm tpm_bios autofs4 uhci_hcd ehci_hcd usbcore usb_common i915 drm_kms_helper drm i2c_algo_bit video button edd fan processor ata_generic thermal thermal_sys
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0-12317-g44bb655 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  000000000000028d ffff880079851dc8 ffffffff815483ce ffff880079851e08
  ffffffff8104212b 000000000000d340 000000003fffffff 7fffffffffffffff
  ffff88007c28d640 00000000ffff1b2f 0000000f4b8c5f00 ffff880079851e18
 Call Trace:
  [<ffffffff815483ce>] dump_stack+0x19/0x1b
  [<ffffffff8104212b>] warn_slowpath_common+0x6b/0xa0
  [<ffffffff81042175>] warn_slowpath_null+0x15/0x20
  [<ffffffff8109f13e>] tick_nohz_stop_sched_tick+0x38e/0x3a0
  [<ffffffff8109f27b>] __tick_nohz_idle_enter+0x12b/0x170
  [<ffffffff8109f2ed>] tick_nohz_idle_enter+0x2d/0x60
  [<ffffffff810944c5>] cpu_idle_loop+0x35/0x230
  [<ffffffff810946de>] cpu_startup_entry+0x1e/0x20
  [<ffffffff81540072>] start_secondary+0x89/0x97
 ---[ end trace ecffd04d10ec9f65 ]---
 smpboot: CPU 1 is now offline
 PM: Creating hibernation image:
 PM: Need to copy 194352 pages
 PM: Normal pages needed: 194352 + 1024, available pages: 315053
 microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60f
 Enabling non-boot CPUs ...
 smpboot: Booting Node 0 Processor 1 APIC 0x1
 CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
 Disabled fast string operations
 src: 1 dst: 1 stopped: 1
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W    3.9.0-12317-g44bb655 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffff88007c28cca0 ffff880079851e08 ffffffff815483ce ffff880079851e28
  ffffffff8107751c ffff88007c28cca0 ffff88007c28cca0 ffff880079851e68
  ffffffff810529db 0000000179851e78 ffff88007c28cca0 0000000000000001
 Call Trace:
  [<ffffffff815483ce>] dump_stack+0x19/0x1b
  [<ffffffff8107751c>] wake_up_nohz_cpu+0xdc/0xf0
  [<ffffffff810529db>] add_timer_on+0xdb/0x110
  [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
  [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
  [<ffffffff8153e27e>] mcheck_cpu_init+0x6f/0x111
  [<ffffffff8153b99e>] identify_cpu+0x3cc/0x3f9
  [<ffffffff8153b9dd>] identify_secondary_cpu+0x12/0x1d
  [<ffffffff8153fe26>] smp_store_cpu_info+0x3a/0x3c
  [<ffffffff8153ff12>] smp_callin+0xea/0x1c1
  [<ffffffff8154000d>] start_secondary+0x24/0x97
 ------------[ cut here ]------------
 WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x59/0x60()
 Modules linked in: af_packet tun iptable_mangle xt_DSCP nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables rfcomm bnep btusb bluetooth iTCO_wdt cpufreq_conservative cpufreq_userspace iTCO_vendor_support cpufreq_powersave acpi_cpufreq mperf kvm_intel kvm snd_hda_codec_conexant snd_hda_intel snd_hda_codec microcode snd_hwdep sg snd_pcm iwldvm thinkpad_acpi mac80211 snd_seq iwlwifi pcspkr i2c_i801 cfg80211 lpc_ich mfd_core snd_timer snd_seq_device rfkill e1000e snd ptp mei_me snd_page_alloc mei pps_core ehci_pci wmi tpm_tis soundcore ac battery tpm tpm_bios autofs4 uhci_hcd ehci_hcd usbcore usb_common i915 drm_kms_helper drm i2c_algo_bit video button edd fan processor ata_generic thermal thermal_sys
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W    3.9.0-12317-g44bb655 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  000000000000007b ffff880079851da8 ffffffff815483ce ffff880079851de8
  ffffffff8104212b ffff88007c28cca0 0000000000000001 ffff88007c28cca0
  ffff880079878000 0000000100004043 0000000000000096 ffff880079851df8
 Call Trace:
  [<ffffffff815483ce>] dump_stack+0x19/0x1b
  [<ffffffff8104212b>] warn_slowpath_common+0x6b/0xa0
  [<ffffffff81042175>] warn_slowpath_null+0x15/0x20
  [<ffffffff81026b09>] native_smp_send_reschedule+0x59/0x60
  [<ffffffff81077486>] wake_up_nohz_cpu+0x46/0xf0
  [<ffffffff810529db>] add_timer_on+0xdb/0x110
  [<ffffffff8101e4f4>] mce_start_timer+0x64/0x70
  [<ffffffff8101e552>] __mcheck_cpu_init_timer+0x52/0x60
  [<ffffffff8153e27e>] mcheck_cpu_init+0x6f/0x111
  [<ffffffff8153b99e>] identify_cpu+0x3cc/0x3f9
  [<ffffffff8153b9dd>] identify_secondary_cpu+0x12/0x1d
  [<ffffffff8153fe26>] smp_store_cpu_info+0x3a/0x3c
  [<ffffffff8153ff12>] smp_callin+0xea/0x1c1
  [<ffffffff8154000d>] start_secondary+0x24/0x97
 ---[ end trace ecffd04d10ec9f66 ]---
 microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60f
 CPU1 is up
[ ... snip ... ]

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-13 14:56             ` Jiri Kosina
@ 2013-05-13 19:40               ` Thomas Gleixner
  2013-05-13 20:01                 ` Jiri Kosina
  2013-05-14 15:46                 ` [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 54+ messages in thread
From: Thomas Gleixner @ 2013-05-13 19:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Frederic Weisbecker, Borislav Petkov, Tony Luck, linux-kernel, x86

On Mon, 13 May 2013, Jiri Kosina wrote:
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  
> >  			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
> >  			ts->tick_stopped = 1;
> > +			WARN_ON_ONCE(!cpu_online(cpu));

So that warning triggers.

>  WARNING: at kernel/time/tick-sched.c:653 tick_nohz_stop_sched_tick+0x38e/0x3a0()

The pre full dyntick idle code bailed out when a cpu was offline. The
new fangled can_stop_idle_tick() function dropped that.

Does the patch below fix the issue?

Thanks,

	tglx

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bc67d42..00a9a97 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -717,6 +717,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(!cpu_online(cpu))) {
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		return false;
 	}
 
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-13 19:40               ` Thomas Gleixner
@ 2013-05-13 20:01                 ` Jiri Kosina
  2013-05-14 15:46                 ` [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 54+ messages in thread
From: Jiri Kosina @ 2013-05-13 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Borislav Petkov, Tony Luck, linux-kernel, x86

On Mon, 13 May 2013, Thomas Gleixner wrote:

> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > >  
> > >  			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
> > >  			ts->tick_stopped = 1;
> > > +			WARN_ON_ONCE(!cpu_online(cpu));
> 
> So that warning triggers.
> 
> >  WARNING: at kernel/time/tick-sched.c:653 tick_nohz_stop_sched_tick+0x38e/0x3a0()
> 
> The pre full dyntick idle code bailed out when a cpu was offline. The
> new fangled can_stop_idle_tick() function dropped that.
> 
> Does the patch below fix the issue?
> 
> Thanks,
> 
> 	tglx
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index bc67d42..00a9a97 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -717,6 +717,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  	if (unlikely(!cpu_online(cpu))) {
>  		if (cpu == tick_do_timer_cpu)
>  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> +		return false;
>  	}
>  
>  	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))

The warning is gone, so it definitely looks like you found a culprit, 
Thomas, thanks.

But I am not going to provide my Tested-by: yet, as one of the four 
suspend-resume cycles I have done during testing of this patch ended up 
with segfaults when trying to start any userspace binary after resume 
(i.e. clear memory corruption). The remaining three suspend-resume 
cycles were fine.

Will be looking into this a little bit more.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline
  2013-05-13 19:40               ` Thomas Gleixner
  2013-05-13 20:01                 ` Jiri Kosina
@ 2013-05-14 15:46                 ` tip-bot for Thomas Gleixner
  2013-05-15 19:41                   ` Frederic Weisbecker
  1 sibling, 1 reply; 54+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-05-14 15:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tony.luck, jkosina, bp, fweisbec, tglx, prarit

Commit-ID:  f7ea0fd639c2c48d3c61b6eec75362be290c6874
Gitweb:     http://git.kernel.org/tip/f7ea0fd639c2c48d3c61b6eec75362be290c6874
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 13 May 2013 21:40:27 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 14 May 2013 17:40:31 +0200

tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline

commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
to bail out when the cpu is offline. That's causing subsequent
failures as an offline CPU is supposed to die and not to fiddle with
nohz magic.

Return false in can_stop_idle_tick() if the cpu is offline.

Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-sched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0eed1db..0121421 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -469,6 +469,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(!cpu_online(cpu))) {
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		return false;
 	}
 
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-09 12:58   ` Borislav Petkov
@ 2013-05-15 18:45     ` Paul E. McKenney
  2013-05-15 22:43       ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2013-05-15 18:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Kosina, Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Thu, May 09, 2013 at 02:58:59PM +0200, Borislav Petkov wrote:
> On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote:
> > Looks like we're sending a resched IPI to a cpu which is not online
> > yet in order to start the MCE polling timer. So the rcu* options are
> > kinda unlikely to be related, AFAICT.
> 
> On a second thought, they must be somehow indirectly related because I
> had "rcu_nocbs=4-7" on the command line and have the warning on the same
> CPUs: 4 - 7.
> 
> Actually, NO_HZ_FULL_ALL makes all CPUs full dynticks so I'd guess the
> RCU callback offloading happens on all of them, thus the warning can
> happen on every CPU which is in full dynticks mode but not online yet.

Does the following patch help?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 63098a5..68b1f26 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2371,7 +2371,7 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
 static void rcu_kick_nohz_cpu(int cpu)
 {
 #ifdef CONFIG_NO_HZ_FULL
-	if (tick_nohz_full_cpu(cpu))
+	if (tick_nohz_full_cpu(cpu) && cpu_online(cpu))
 		smp_send_reschedule(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline
  2013-05-14 15:46                 ` [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline tip-bot for Thomas Gleixner
@ 2013-05-15 19:41                   ` Frederic Weisbecker
  2013-05-16 14:06                     ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-15 19:41 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, jkosina, tony.luck, bp, tglx, prarit
  Cc: linux-tip-commits

On Tue, May 14, 2013 at 08:46:59AM -0700, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  f7ea0fd639c2c48d3c61b6eec75362be290c6874
> Gitweb:     http://git.kernel.org/tip/f7ea0fd639c2c48d3c61b6eec75362be290c6874
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Mon, 13 May 2013 21:40:27 +0200
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Tue, 14 May 2013 17:40:31 +0200
> 
> tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
> 
> commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
> idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
> to bail out when the cpu is offline. That's causing subsequent
> failures as an offline CPU is supposed to die and not to fiddle with
> nohz magic.

I don't see the logic has changed after this commit. The old nohz
code was already shutting down the tick on offline CPUs.

Am I missing something?
I'm not sure which behaviour we want though, nor what the side
effect could be whether we keep or shutdown the tick.

There is also the problem of full dynticks CPUs that have their
tick stopped and then get later offlined. They may end up in the idle loop
with the tick already stopped and it's not going to be restarted. I believe
that can trigger the same issue.

I'm just not sure what I should do: restart the tick when the CPU
is offlining? What could be the side effect of that?

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-15 18:45     ` Paul E. McKenney
@ 2013-05-15 22:43       ` Borislav Petkov
  2013-05-15 23:55         ` Paul E. McKenney
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-15 22:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jiri Kosina, Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote:
> Does the following patch help?

Hmm, I just tried on 3.10-rc1

CONFIG_NO_HZ_FULL_ALL=y

on the one hand and then

CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ_FULL_ALL is not set

with "nohz_full=4-7 rcu_nocbs=4-7" on the cmdline and I don't see the
splats anymore.

Do I need to try something else?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule
  2013-05-15 22:43       ` Borislav Petkov
@ 2013-05-15 23:55         ` Paul E. McKenney
  2013-05-17 13:56           ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2 Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Paul E. McKenney @ 2013-05-15 23:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Kosina, Frederic Weisbecker, Tony Luck, linux-kernel, x86

On Thu, May 16, 2013 at 12:43:58AM +0200, Borislav Petkov wrote:
> On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > Does the following patch help?
> 
> Hmm, I just tried on 3.10-rc1
> 
> CONFIG_NO_HZ_FULL_ALL=y
> 
> on the one hand and then
> 
> CONFIG_NO_HZ_FULL=y
> # CONFIG_NO_HZ_FULL_ALL is not set
> 
> with "nohz_full=4-7 rcu_nocbs=4-7" on the cmdline and I don't see the
> splats anymore.
> 
> Do I need to try something else?

I never saw the problem, so I have to defer to you on this one.  I will
hold off on the patch unless the problem shows up again.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline
  2013-05-15 19:41                   ` Frederic Weisbecker
@ 2013-05-16 14:06                     ` Thomas Gleixner
  2013-05-16 14:15                       ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2013-05-16 14:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: mingo, hpa, linux-kernel, jkosina, tony.luck, bp, prarit,
	linux-tip-commits

On Wed, 15 May 2013, Frederic Weisbecker wrote:

> On Tue, May 14, 2013 at 08:46:59AM -0700, tip-bot for Thomas Gleixner wrote:
> > Commit-ID:  f7ea0fd639c2c48d3c61b6eec75362be290c6874
> > Gitweb:     http://git.kernel.org/tip/f7ea0fd639c2c48d3c61b6eec75362be290c6874
> > Author:     Thomas Gleixner <tglx@linutronix.de>
> > AuthorDate: Mon, 13 May 2013 21:40:27 +0200
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Tue, 14 May 2013 17:40:31 +0200
> > 
> > tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
> > 
> > commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
> > idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
> > to bail out when the cpu is offline. That's causing subsequent
> > failures as an offline CPU is supposed to die and not to fiddle with
> > nohz magic.
> 
> I don't see the logic has changed after this commit. The old nohz
> code was already shutting down the tick on offline CPUs.

Hmm, you're right. Soemthing tricked me.
 
> Am I missing something?
> I'm not sure which behaviour we want though, nor what the side
> effect could be whether we keep or shutdown the tick.
>
> There is also the problem of full dynticks CPUs that have their
> tick stopped and then get later offlined. They may end up in the idle loop
> with the tick already stopped and it's not going to be restarted. I believe
> that can trigger the same issue.
> 
> I'm just not sure what I should do: restart the tick when the CPU
> is offlining? What could be the side effect of that?

No, it's not about restarting the tick. It's about not falling into
the nohz functions when the cpu is about to die. Just leave it alone,
that's what the patch does.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline
  2013-05-16 14:06                     ` Thomas Gleixner
@ 2013-05-16 14:15                       ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-16 14:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, mingo, hpa, linux-kernel, jkosina,
	tony.luck, prarit, linux-tip-commits

On Thu, May 16, 2013 at 04:06:58PM +0200, Thomas Gleixner wrote:
> No, it's not about restarting the tick. It's about not falling into
> the nohz functions when the cpu is about to die. Just leave it alone,
> that's what the patch does.

Yep, this is how I saw it too: why even fiddle when the tick when we're
going offline anyway. We can look at the tick when we come back online
and enter idle.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-15 23:55         ` Paul E. McKenney
@ 2013-05-17 13:56           ` Borislav Petkov
  2013-05-20  3:16             ` Michael Wang
  2013-06-04 21:20             ` Jiri Kosina
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-17 13:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jiri Kosina, Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner

On Wed, May 15, 2013 at 04:55:13PM -0700, Paul E. McKenney wrote:
> I never saw the problem, so I have to defer to you on this one. I will
> hold off on the patch unless the problem shows up again.

Thanks Paul.

Well, it's not this problem, but another one. Let me check if everyone
is on CC... nope, tglx is missing, added.

Ok, here's the setup:

CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_NO_HZ=y
CONFIG_RCU_FAST_NO_HZ=y

Tree is Linus from today + tip/master, i.e. it has all fixes, even

commit f7ea0fd639c2c48d3c61b6eec75362be290c6874
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon May 13 21:40:27 2013 +0200

    tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline

Now, when I halt the box, I see these splats originating from cpufreq's
od_dbs_timer adding a workqueue which does add_timer_on:


[   49.338878] EXT4-fs (sda7): re-mounted. Opts: (null)
[   51.502417] kvm: exiting hardware virtualization
[   51.597330] ACPI: Preparing to enter system sleep state S5
[   51.603147] Disabling non-boot CPUs ...
[   51.616759] ------------[ cut here ]------------
[   51.621460] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   51.629638] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   51.675581] CPU: 0 PID: 244 Comm: kworker/1:1 Tainted: G        W    3.10.0-rc1+ #10
[   51.683407] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   51.690901] Workqueue: events od_dbs_timer
[   51.695069]  0000000000000009 ffff88043a2f5b68 ffffffff8161441c ffff88043a2f5ba8
[   51.702602]  ffffffff8103e540 0000000000000033 0000000000000001 ffff88043d5f8000
[   51.710136]  00000000ffff0ce1 0000000000000001 ffff88044fc4fc08 ffff88043a2f5bb8
[   51.717691] Call Trace:
[   51.720191]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   51.725396]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   51.731473]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   51.737378]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   51.744013]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   51.749745]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   51.755214]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   51.761470]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   51.767724]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   51.773719]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   51.779271]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   51.784734]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   51.790634]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   51.796711]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   51.802350]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   51.808264]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   51.813200]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.819644]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   51.918165] nouveau E[     DRM] GPU lockup - switching to software fbcon
[   51.930505]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.936994] ---[ end trace f419538ada83b5c5 ]---
[   51.942915] ------------[ cut here ]------------
[   51.942928] ------------[ cut here ]------------
[   51.942936] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   51.942974] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   51.942978] CPU: 5 PID: 740 Comm: kworker/3:2 Tainted: G        W    3.10.0-rc1+ #10
[   51.942979] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   51.942985] Workqueue: events od_dbs_timer
[   51.942990]  0000000000000009 ffff88043ab0db68 ffffffff8161441c ffff88043ab0dba8
[   51.942994]  ffffffff8103e540 000000003ab0dbf8 0000000000000003 ffff88043d708000
[   51.942998]  00000000ffff0d32 0000000000000003 ffff88044fccfc08 ffff88043ab0dbb8
[   51.942999] Call Trace:
[   51.943005]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   51.943010]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   51.943014]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   51.943017]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   51.943021]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   51.943027]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   51.943031]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   51.943035]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   51.943038]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   51.943043]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   51.943046]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   51.943050]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   51.943053]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   51.943057]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   51.943060]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   51.943063]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   51.943068]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.943071]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   51.943074]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   51.943076] ---[ end trace f419538ada83b5c6 ]---
[   52.178461] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   52.188097] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   52.238477] CPU: 0 PID: 85 Comm: kworker/2:1 Tainted: G        W    3.10.0-rc1+ #10
[   52.247669] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   52.256604] Workqueue: events od_dbs_timer
[   52.262219]  0000000000000009 ffff88043b62db68 ffffffff8161441c ffff88043b62dba8
[   52.271194]  ffffffff8103e540 0000000000000033 0000000000000002 ffff88043d6dc000
[   52.280163]  00000000ffff0d32 0000000000000002 ffff88044fc8fc08 ffff88043b62dbb8
[   52.289141] Call Trace:
[   52.293066]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   52.299704]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   52.307213]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   52.314540]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   52.322592]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   52.329763]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   52.336660]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   52.344349]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   52.352031]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   52.359458]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   52.366438]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   52.373338]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   52.380670]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   52.388176]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   52.395247]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   52.402588]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   52.408954]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.416830]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   52.423722]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.431588] ---[ end trace f419538ada83b5c7 ]---
[   52.460411] ------------[ cut here ]------------
[   52.467744] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   52.478684] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   52.533573] CPU: 5 PID: 740 Comm: kworker/3:2 Tainted: G        W    3.10.0-rc1+ #10
[   52.544303] ------------[ cut here ]------------
[   52.544305] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   52.544317] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   52.544318] CPU: 0 PID: 71 Comm: kworker/4:1 Tainted: G        W    3.10.0-rc1+ #10
[   52.544318] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   52.544322] Workqueue: events od_dbs_timer
[   52.544324]  0000000000000009 ffff88043c271b68 ffffffff8161441c ffff88043c271ba8
[   52.544325]  ffffffff8103e540 0000000000000033 0000000000000004 ffff88043d738000
[   52.544326]  00000000ffff0dc8 0000000000000004 ffff88044fd0fc08 ffff88043c271bb8
[   52.544327] Call Trace:
[   52.544330]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   52.544333]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   52.544334]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   52.544335]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   52.544337]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   52.544340]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   52.544342]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   52.544343]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   52.544344]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   52.544346]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   52.544347]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   52.544348]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   52.544349]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   52.544350]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   52.544351]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   52.544353]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   52.544354]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.544356]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   52.544357]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.544357] ---[ end trace f419538ada83b5c8 ]---
[   52.798038] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   52.806551] Workqueue: events od_dbs_timer
[   52.811736]  0000000000000009 ffff88043ab0db68 ffffffff8161441c ffff88043ab0dba8
[   52.820284]  ffffffff8103e540 0000000000000033 0000000000000003 ffff88043d708000
[   52.828828]  00000000ffff0db3 0000000000000003 ffff88044fccfc08 ffff88043ab0dbb8
[   52.837372] Call Trace:
[   52.840874]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   52.847090]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   52.854176]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   52.861086]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   52.868694]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   52.875432]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   52.881902]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   52.889160]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   52.896416]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   52.903409]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   52.909966]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   52.916434]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   52.923342]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   52.930427]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   52.937074]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   52.943983]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   52.949926]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.957370]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   52.963841]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   52.971275] ---[ end trace f419538ada83b5c9 ]---
[   52.976979] nouveau W[   PFIFO][0000:03:00.0] unknown intr 0x00400000, ch 1
[   53.092122] ------------[ cut here ]------------
[   53.099585] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
[   53.110571] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
[   53.165267] CPU: 0 PID: 123 Comm: kworker/5:1 Tainted: G        W    3.10.0-rc1+ #10
[   53.175902] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   53.186190] Workqueue: events od_dbs_timer
[   53.193136]  0000000000000009 ffff88043b277b68 ffffffff8161441c ffff88043b277ba8
[   53.203477]  ffffffff8103e540 000000003b277bb8 0000000000000005 ffff88043d764000
[   53.213727]  00000000ffff0e52 0000000000000005 ffff88044fd4fc08 ffff88043b277bb8
[   53.223894] Call Trace:
[   53.228887]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
[   53.236593]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   53.245160]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   53.253519]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
[   53.262582]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
[   53.270756]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
[   53.278654]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
[   53.287335]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
[   53.296002]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
[   53.304412]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
[   53.312388]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
[   53.320267]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   53.328584]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   53.337083]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   53.345142]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   53.353484]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   53.360847]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   53.369709]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
[   53.377603]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   53.386474] ---[ end trace f419538ada83b5ca ]---
[   53.395276] Power down.
[   53.399033] acpi_power_off called

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-17 13:56           ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2 Borislav Petkov
@ 2013-05-20  3:16             ` Michael Wang
  2013-05-20  4:50               ` Borislav Petkov
  2013-06-04 21:20             ` Jiri Kosina
  1 sibling, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  3:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paul E. McKenney, Jiri Kosina, Frederic Weisbecker, Tony Luck,
	linux-kernel, x86, Thomas Gleixner

Hi, Borislav

On 05/17/2013 09:56 PM, Borislav Petkov wrote:
[snip]
> [   51.737378]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   51.744013]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0

I suppose the reason is that the cpu we passed to mod_delayed_work_on()
has a chance to become offline before we disabled irq, what about check
it before send resched ipi? like:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bfa7e77..d0e8f15 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,7 +626,7 @@ static bool wake_up_full_nohz_cpu(int cpu)

 void wake_up_nohz_cpu(int cpu)
 {
-       if (!wake_up_full_nohz_cpu(cpu))
+       if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
                wake_up_idle_cpu(cpu);
 }

Regards,
Michael Wang


> [   51.749745]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   51.755214]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   51.761470]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   51.767724]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   51.773719]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   51.779271]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   51.784734]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   51.790634]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   51.796711]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   51.802350]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   51.808264]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   51.813200]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   51.819644]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   51.918165] nouveau E[     DRM] GPU lockup - switching to software fbcon
> [   51.930505]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   51.936994] ---[ end trace f419538ada83b5c5 ]---
> [   51.942915] ------------[ cut here ]------------
> [   51.942928] ------------[ cut here ]------------
> [   51.942936] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   51.942974] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   51.942978] CPU: 5 PID: 740 Comm: kworker/3:2 Tainted: G        W    3.10.0-rc1+ #10
> [   51.942979] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   51.942985] Workqueue: events od_dbs_timer
> [   51.942990]  0000000000000009 ffff88043ab0db68 ffffffff8161441c ffff88043ab0dba8
> [   51.942994]  ffffffff8103e540 000000003ab0dbf8 0000000000000003 ffff88043d708000
> [   51.942998]  00000000ffff0d32 0000000000000003 ffff88044fccfc08 ffff88043ab0dbb8
> [   51.942999] Call Trace:
> [   51.943005]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   51.943010]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   51.943014]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   51.943017]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   51.943021]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   51.943027]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   51.943031]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   51.943035]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   51.943038]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   51.943043]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   51.943046]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   51.943050]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   51.943053]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   51.943057]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   51.943060]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   51.943063]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   51.943068]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   51.943071]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   51.943074]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   51.943076] ---[ end trace f419538ada83b5c6 ]---
> [   52.178461] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   52.188097] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   52.238477] CPU: 0 PID: 85 Comm: kworker/2:1 Tainted: G        W    3.10.0-rc1+ #10
> [   52.247669] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   52.256604] Workqueue: events od_dbs_timer
> [   52.262219]  0000000000000009 ffff88043b62db68 ffffffff8161441c ffff88043b62dba8
> [   52.271194]  ffffffff8103e540 0000000000000033 0000000000000002 ffff88043d6dc000
> [   52.280163]  00000000ffff0d32 0000000000000002 ffff88044fc8fc08 ffff88043b62dbb8
> [   52.289141] Call Trace:
> [   52.293066]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   52.299704]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   52.307213]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   52.314540]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   52.322592]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   52.329763]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   52.336660]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   52.344349]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   52.352031]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   52.359458]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   52.366438]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   52.373338]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   52.380670]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   52.388176]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   52.395247]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   52.402588]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   52.408954]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.416830]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   52.423722]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.431588] ---[ end trace f419538ada83b5c7 ]---
> [   52.460411] ------------[ cut here ]------------
> [   52.467744] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   52.478684] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   52.533573] CPU: 5 PID: 740 Comm: kworker/3:2 Tainted: G        W    3.10.0-rc1+ #10
> [   52.544303] ------------[ cut here ]------------
> [   52.544305] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   52.544317] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   52.544318] CPU: 0 PID: 71 Comm: kworker/4:1 Tainted: G        W    3.10.0-rc1+ #10
> [   52.544318] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   52.544322] Workqueue: events od_dbs_timer
> [   52.544324]  0000000000000009 ffff88043c271b68 ffffffff8161441c ffff88043c271ba8
> [   52.544325]  ffffffff8103e540 0000000000000033 0000000000000004 ffff88043d738000
> [   52.544326]  00000000ffff0dc8 0000000000000004 ffff88044fd0fc08 ffff88043c271bb8
> [   52.544327] Call Trace:
> [   52.544330]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   52.544333]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   52.544334]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   52.544335]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   52.544337]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   52.544340]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   52.544342]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   52.544343]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   52.544344]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   52.544346]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   52.544347]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   52.544348]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   52.544349]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   52.544350]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   52.544351]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   52.544353]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   52.544354]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.544356]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   52.544357]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.544357] ---[ end trace f419538ada83b5c8 ]---
> [   52.798038] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   52.806551] Workqueue: events od_dbs_timer
> [   52.811736]  0000000000000009 ffff88043ab0db68 ffffffff8161441c ffff88043ab0dba8
> [   52.820284]  ffffffff8103e540 0000000000000033 0000000000000003 ffff88043d708000
> [   52.828828]  00000000ffff0db3 0000000000000003 ffff88044fccfc08 ffff88043ab0dbb8
> [   52.837372] Call Trace:
> [   52.840874]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   52.847090]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   52.854176]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   52.861086]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   52.868694]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   52.875432]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   52.881902]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   52.889160]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   52.896416]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   52.903409]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   52.909966]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   52.916434]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   52.923342]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   52.930427]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   52.937074]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   52.943983]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   52.949926]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.957370]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   52.963841]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   52.971275] ---[ end trace f419538ada83b5c9 ]---
> [   52.976979] nouveau W[   PFIFO][0000:03:00.0] unknown intr 0x00400000, ch 1
> [   53.092122] ------------[ cut here ]------------
> [   53.099585] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   53.110571] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   53.165267] CPU: 0 PID: 123 Comm: kworker/5:1 Tainted: G        W    3.10.0-rc1+ #10
> [   53.175902] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   53.186190] Workqueue: events od_dbs_timer
> [   53.193136]  0000000000000009 ffff88043b277b68 ffffffff8161441c ffff88043b277ba8
> [   53.203477]  ffffffff8103e540 000000003b277bb8 0000000000000005 ffff88043d764000
> [   53.213727]  00000000ffff0e52 0000000000000005 ffff88044fd4fc08 ffff88043b277bb8
> [   53.223894] Call Trace:
> [   53.228887]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   53.236593]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   53.245160]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   53.253519]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   53.262582]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   53.270756]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   53.278654]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   53.287335]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   53.296002]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   53.304412]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   53.312388]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   53.320267]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   53.328584]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   53.337083]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   53.345142]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   53.353484]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   53.360847]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   53.369709]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0
> [   53.377603]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   53.386474] ---[ end trace f419538ada83b5ca ]---
> [   53.395276] Power down.
> [   53.399033] acpi_power_off called
> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  3:16             ` Michael Wang
@ 2013-05-20  4:50               ` Borislav Petkov
  2013-05-20  6:23                 ` Michael Wang
  2013-05-20  8:10                 ` Frederic Weisbecker
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-20  4:50 UTC (permalink / raw)
  To: Michael Wang
  Cc: Paul E. McKenney, Jiri Kosina, Frederic Weisbecker, Tony Luck,
	linux-kernel, x86, Thomas Gleixner

On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
> I suppose the reason is that the cpu we passed to
> mod_delayed_work_on() has a chance to become offline before we
> disabled irq, what about check it before send resched ipi? like:

I think this is only addressing the symptoms - what we should be doing
instead is asking ourselves why are we even scheduling work on a cpu if
the machine goes offline?

I don't know though who should be responsible for killing all that
work - the workqueue itself or the guy who created it, i.e. cpufreq
governor...

Hmmm.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  4:50               ` Borislav Petkov
@ 2013-05-20  6:23                 ` Michael Wang
  2013-05-20  6:47                   ` Borislav Petkov
  2013-05-20  8:10                 ` Frederic Weisbecker
  1 sibling, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  6:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paul E. McKenney, Jiri Kosina, Frederic Weisbecker, Tony Luck,
	linux-kernel, x86, Thomas Gleixner

On 05/20/2013 12:50 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>> I suppose the reason is that the cpu we passed to
>> mod_delayed_work_on() has a chance to become offline before we
>> disabled irq, what about check it before send resched ipi? like:
> 
> I think this is only addressing the symptoms - what we should be doing
> instead is asking ourselves why are we even scheduling work on a cpu if
> the machine goes offline?
> 
> I don't know though who should be responsible for killing all that
> work - the workqueue itself or the guy who created it, i.e. cpufreq
> governor...

So there are two questions here:
1. Is gov_queue_work() want to queue the work on offline cpu?
2. Is mod_delayed_work_on() allow offline cpu?

I guess both should be false?

Regards,
Michael Wang

> 
> Hmmm.
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  6:23                 ` Michael Wang
@ 2013-05-20  6:47                   ` Borislav Petkov
  2013-05-20  6:58                     ` Michael Wang
  2013-05-20  7:36                     ` Tejun Heo
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-20  6:47 UTC (permalink / raw)
  To: Michael Wang, Tejun Heo
  Cc: Paul E. McKenney, Jiri Kosina, Frederic Weisbecker, Tony Luck,
	linux-kernel, x86, Thomas Gleixner

On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
> > On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
> >> I suppose the reason is that the cpu we passed to
> >> mod_delayed_work_on() has a chance to become offline before we
> >> disabled irq, what about check it before send resched ipi? like:
> > 
> > I think this is only addressing the symptoms - what we should be doing
> > instead is asking ourselves why are we even scheduling work on a cpu if
> > the machine goes offline?
> > 
> > I don't know though who should be responsible for killing all that
> > work - the workqueue itself or the guy who created it, i.e. cpufreq
> > governor...
> 
> So there are two questions here:
> 1. Is gov_queue_work() want to queue the work on offline cpu?
> 2. Is mod_delayed_work_on() allow offline cpu?
> 
> I guess both should be false?

Well, if we don't allow queueing work on a cpu which goes offline, i.e.
#2, the problem should be solved.

Tejun?

Here are the splats: http://marc.info/?l=linux-kernel&m=136879901425951

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  6:47                   ` Borislav Petkov
@ 2013-05-20  6:58                     ` Michael Wang
  2013-05-20  7:06                       ` Michael Wang
  2013-05-20  7:36                     ` Tejun Heo
  1 sibling, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  6:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tejun Heo, Paul E. McKenney, Jiri Kosina, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner

On 05/20/2013 02:47 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
>>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>>>> I suppose the reason is that the cpu we passed to
>>>> mod_delayed_work_on() has a chance to become offline before we
>>>> disabled irq, what about check it before send resched ipi? like:
>>>
>>> I think this is only addressing the symptoms - what we should be doing
>>> instead is asking ourselves why are we even scheduling work on a cpu if
>>> the machine goes offline?
>>>
>>> I don't know though who should be responsible for killing all that
>>> work - the workqueue itself or the guy who created it, i.e. cpufreq
>>> governor...
>>
>> So there are two questions here:
>> 1. Is gov_queue_work() want to queue the work on offline cpu?
>> 2. Is mod_delayed_work_on() allow offline cpu?
>>
>> I guess both should be false?
> 
> Well, if we don't allow queueing work on a cpu which goes offline, i.e.
> #2, the problem should be solved.

I've take a look at the usage of queue_delayed_work_on() and
mod_delayed_work_on(), mostly passed this_cpu, or those in online mask,
I think offline cpu is not by designed.

Besides, the cpu gov_queue_work() is using 'policy->cpus' which seems to
be updated during UP DOWN notify, I think they are supposed to be online.

But we need expert in cpufreq to confirm all these...

Regards,
Michael Wang

> 
> Tejun?
> 
> Here are the splats: http://marc.info/?l=linux-kernel&m=136879901425951
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  6:58                     ` Michael Wang
@ 2013-05-20  7:06                       ` Michael Wang
  2013-05-20  7:12                         ` Viresh Kumar
  0 siblings, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  7:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tejun Heo, Paul E. McKenney, Jiri Kosina, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner, rjw, Viresh Kumar,
	cpufreq, linux-pm

On 05/20/2013 02:58 PM, Michael Wang wrote:
> On 05/20/2013 02:47 PM, Borislav Petkov wrote:
>> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
>>>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>>>>> I suppose the reason is that the cpu we passed to
>>>>> mod_delayed_work_on() has a chance to become offline before we
>>>>> disabled irq, what about check it before send resched ipi? like:
>>>>
>>>> I think this is only addressing the symptoms - what we should be doing
>>>> instead is asking ourselves why are we even scheduling work on a cpu if
>>>> the machine goes offline?
>>>>
>>>> I don't know though who should be responsible for killing all that
>>>> work - the workqueue itself or the guy who created it, i.e. cpufreq
>>>> governor...
>>>
>>> So there are two questions here:
>>> 1. Is gov_queue_work() want to queue the work on offline cpu?
>>> 2. Is mod_delayed_work_on() allow offline cpu?
>>>
>>> I guess both should be false?
>>
>> Well, if we don't allow queueing work on a cpu which goes offline, i.e.
>> #2, the problem should be solved.
> 
> I've take a look at the usage of queue_delayed_work_on() and
> mod_delayed_work_on(), mostly passed this_cpu, or those in online mask,
> I think offline cpu is not by designed.
> 
> Besides, the cpu gov_queue_work() is using 'policy->cpus' which seems to
> be updated during UP DOWN notify, I think they are supposed to be online.
> 
> But we need expert in cpufreq to confirm all these...

And I guess this may help to reduce the chance to trigger WARN:

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 443442d..0f96013 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -180,7 +180,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy,
        if (!all_cpus) {
                __gov_queue_work(smp_processor_id(), dbs_data, delay);
        } else {
-               for_each_cpu(i, policy->cpus)
+               for_each_cpu_and(i, policy->cpus, cpu_online_mask)
                        __gov_queue_work(i, dbs_data, delay);
        }
 }

Well, disable irq will be better, anyway...still need folks who own that
driver to make the decision, so let's CC them :)

Regards,
Michael Wang


> 
> Regards,
> Michael Wang
> 
>>
>> Tejun?
>>
>> Here are the splats: http://marc.info/?l=linux-kernel&m=136879901425951
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  7:06                       ` Michael Wang
@ 2013-05-20  7:12                         ` Viresh Kumar
  2013-05-20  7:25                           ` Michael Wang
  0 siblings, 1 reply; 54+ messages in thread
From: Viresh Kumar @ 2013-05-20  7:12 UTC (permalink / raw)
  To: Michael Wang
  Cc: Borislav Petkov, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

Hi Michael,

I haven't followed this mail chain earlier and saw this mail only as I am
added in cc now. I probably have answers to few questions here:

On 20 May 2013 12:36, Michael Wang <wangyun@linux.vnet.ibm.com> wrote:
> On 05/20/2013 02:58 PM, Michael Wang wrote:
>> On 05/20/2013 02:47 PM, Borislav Petkov wrote:
>>> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>>>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
>>>> So there are two questions here:
>>>> 1. Is gov_queue_work() want to queue the work on offline cpu?

No. We are only working with online cpus now in cpufreq core and governors.

>> Besides, the cpu gov_queue_work() is using 'policy->cpus' which seems to
>> be updated during UP DOWN notify, I think they are supposed to be online.
>>
>> But we need expert in cpufreq to confirm all these...

I confirm this. policy->cpus only contains online cpus.. and
policy->related_cpus
always contain online+offline cpus.

> And I guess this may help to reduce the chance to trigger WARN:
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 443442d..0f96013 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -180,7 +180,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
> struct cpufreq_policy *policy,
>         if (!all_cpus) {
>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>         } else {
> -               for_each_cpu(i, policy->cpus)
> +               for_each_cpu_and(i, policy->cpus, cpu_online_mask)
>                         __gov_queue_work(i, dbs_data, delay);
>         }
>  }

Not required at all... policy->cpus is guaranteed to have only online cpus.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  7:12                         ` Viresh Kumar
@ 2013-05-20  7:25                           ` Michael Wang
  2013-05-20  8:56                             ` Michael Wang
  0 siblings, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  7:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Borislav Petkov, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

Hi, Viresh
On 05/20/2013 03:12 PM, Viresh Kumar wrote:
> Hi Michael,
> 
> I haven't followed this mail chain earlier and saw this mail only as I am
> added in cc now. I probably have answers to few questions here:

Thanks for your quick respond :)

> 
> On 20 May 2013 12:36, Michael Wang <wangyun@linux.vnet.ibm.com> wrote:
>> On 05/20/2013 02:58 PM, Michael Wang wrote:
>>> On 05/20/2013 02:47 PM, Borislav Petkov wrote:
>>>> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>>>>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
>>>>> So there are two questions here:
>>>>> 1. Is gov_queue_work() want to queue the work on offline cpu?
> 
> No. We are only working with online cpus now in cpufreq core and governors.
> 
>>> Besides, the cpu gov_queue_work() is using 'policy->cpus' which seems to
>>> be updated during UP DOWN notify, I think they are supposed to be online.
>>>
>>> But we need expert in cpufreq to confirm all these...
> 
> I confirm this. policy->cpus only contains online cpus.. and
> policy->related_cpus
> always contain online+offline cpus.

Nice to be confirmed :)

> 
>> And I guess this may help to reduce the chance to trigger WARN:
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 443442d..0f96013 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -180,7 +180,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
>> struct cpufreq_policy *policy,
>>         if (!all_cpus) {
>>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>>         } else {
>> -               for_each_cpu(i, policy->cpus)
>> +               for_each_cpu_and(i, policy->cpus, cpu_online_mask)
>>                         __gov_queue_work(i, dbs_data, delay);
>>         }
>>  }
> 
> Not required at all... policy->cpus is guaranteed to have only online cpus.

Yeah, that's right, I guess the issue is, although the policy->cpus is
correct at a given time, after get cpu from it, it's possible to be
changed, unless we disabled preempt or irq, or hotplug before we use it...

Like such issue cases:
				get x from policy->cpus
	DOWN notifier
	change policy->cpus
	do offline x
				send ipi to x

Will that happen?

Regards,
Michael Wang


> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  6:47                   ` Borislav Petkov
  2013-05-20  6:58                     ` Michael Wang
@ 2013-05-20  7:36                     ` Tejun Heo
  1 sibling, 0 replies; 54+ messages in thread
From: Tejun Heo @ 2013-05-20  7:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Wang, Paul E. McKenney, Jiri Kosina, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner

Hello,

On Mon, May 20, 2013 at 08:47:27AM +0200, Borislav Petkov wrote:
> > So there are two questions here:
> > 1. Is gov_queue_work() want to queue the work on offline cpu?
> > 2. Is mod_delayed_work_on() allow offline cpu?
> > 
> > I guess both should be false?
> 
> Well, if we don't allow queueing work on a cpu which goes offline, i.e.
> #2, the problem should be solved.
> 
> Tejun?

Workqueue proper would behave okish (no affinity of course but the
work items are still executed) but mod_delayed_work_on() adds a timer
on the target CPU and I think adding a timer on an offline cpu would
be a problem.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  4:50               ` Borislav Petkov
  2013-05-20  6:23                 ` Michael Wang
@ 2013-05-20  8:10                 ` Frederic Weisbecker
  2013-05-20  9:31                   ` Srivatsa S. Bhat
  1 sibling, 1 reply; 54+ messages in thread
From: Frederic Weisbecker @ 2013-05-20  8:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Wang, Paul E. McKenney, Jiri Kosina, Tony Luck,
	linux-kernel, x86, Thomas Gleixner

2013/5/20 Borislav Petkov <bp@alien8.de>:
> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>> I suppose the reason is that the cpu we passed to
>> mod_delayed_work_on() has a chance to become offline before we
>> disabled irq, what about check it before send resched ipi? like:
>
> I think this is only addressing the symptoms - what we should be doing
> instead is asking ourselves why are we even scheduling work on a cpu if
> the machine goes offline?
>
> I don't know though who should be responsible for killing all that
> work - the workqueue itself or the guy who created it, i.e. cpufreq
> governor...
>
> Hmmm.

Let's look at this portion of cpu_down():

	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
	if (err) {
		/* CPU didn't die: tell everyone.  Can't complain. */
		smpboot_unpark_threads(cpu);
		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
		goto out_release;
	}
	BUG_ON(cpu_online(cpu));

	/*
	 * The migration_call() CPU_DYING callback will have removed all
	 * runnable tasks from the cpu, there's only the idle task left now
	 * that the migration thread is done doing the stop_machine thing.
	 *
	 * Wait for the stop thread to go away.
	 */
	while (!idle_cpu(cpu))
		cpu_relax();
	/* This actually kills the CPU. */
	__cpu_die(cpu);

	/* CPU is completely dead: tell everyone.  Too late to complain. */
	cpu_notify_nofail(CPU_DEAD | mod, hcpu);

	check_for_tasks(cpu);

The CPU is considered offline after the take_cpu_down stop machine job
completes. But the struct timer_list timers are migrated later through
CPU_DEAD notification. Only once that's completed we check for illegal
residual tasks in the CPU.  So there is a little window between the
stop machine thing and __cpu_die() where a timer can fire with
cpu_online(cpu) == 1.

Now concerning the workqueue I don't know. I guess the per cpu ones
are not migrated due to their affinity. Apparently they can still wake
up and execute works due to the timers...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  7:25                           ` Michael Wang
@ 2013-05-20  8:56                             ` Michael Wang
  2013-05-20  9:09                               ` Viresh Kumar
  0 siblings, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  8:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 05/20/2013 03:25 PM, Michael Wang wrote:
[]
> 
> Yeah, that's right, I guess the issue is, although the policy->cpus is
> correct at a given time, after get cpu from it, it's possible to be
> changed, unless we disabled preempt or irq, or hotplug before we use it...
> 
> Like such issue cases:
> 				get x from policy->cpus
> 	DOWN notifier
> 	change policy->cpus
> 	do offline x
> 				send ipi to x
> 
> Will that happen?

May be we could do some test to confirm it?

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 443442d..449be88 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -26,6 +26,7 @@
 #include <linux/tick.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
+#include <linux/cpu.h>

 #include "cpufreq_governor.h"

@@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy,
        if (!all_cpus) {
                __gov_queue_work(smp_processor_id(), dbs_data, delay);
        } else {
+               get_online_cpus();
                for_each_cpu(i, policy->cpus)
                        __gov_queue_work(i, dbs_data, delay);
+               put_online_cpus();
        }
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);

This is supposed to make WARN disappear, if it works, then BINGO :)

Regards,
Michael Wang

> 
> Regards,
> Michael Wang
> 
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  8:56                             ` Michael Wang
@ 2013-05-20  9:09                               ` Viresh Kumar
  2013-05-20  9:24                                 ` Michael Wang
  0 siblings, 1 reply; 54+ messages in thread
From: Viresh Kumar @ 2013-05-20  9:09 UTC (permalink / raw)
  To: Michael Wang
  Cc: Borislav Petkov, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 20 May 2013 14:26, Michael Wang <wangyun@linux.vnet.ibm.com> wrote:
> On 05/20/2013 03:25 PM, Michael Wang wrote:
>> Yeah, that's right, I guess the issue is, although the policy->cpus is
>> correct at a given time, after get cpu from it, it's possible to be
>> changed, unless we disabled preempt or irq, or hotplug before we use it...
>>
>> Like such issue cases:
>>                               get x from policy->cpus
>>       DOWN notifier
>>       change policy->cpus
>>       do offline x
>>                               send ipi to x
>>
>> Will that happen?

Sorry I am not sure. :(

I can see mutex being used in cpufreq_governor.c which should take care
of race conditions...

> May be we could do some test to confirm it?
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 443442d..449be88 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -26,6 +26,7 @@
>  #include <linux/tick.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> +#include <linux/cpu.h>
>
>  #include "cpufreq_governor.h"
>
> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
> struct cpufreq_policy *policy,
>         if (!all_cpus) {
>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>         } else {
> +               get_online_cpus();
>                 for_each_cpu(i, policy->cpus)
>                         __gov_queue_work(i, dbs_data, delay);
> +               put_online_cpus();
>         }
>  }
>  EXPORT_SYMBOL_GPL(gov_queue_work);
>
> This is supposed to make WARN disappear, if it works, then BINGO :)

Let people test it and then we can talk :)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  9:09                               ` Viresh Kumar
@ 2013-05-20  9:24                                 ` Michael Wang
  2013-05-20 13:23                                   ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-05-20  9:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Borislav Petkov, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 05/20/2013 05:09 PM, Viresh Kumar wrote:
> On 20 May 2013 14:26, Michael Wang <wangyun@linux.vnet.ibm.com> wrote:
>> On 05/20/2013 03:25 PM, Michael Wang wrote:
>>> Yeah, that's right, I guess the issue is, although the policy->cpus is
>>> correct at a given time, after get cpu from it, it's possible to be
>>> changed, unless we disabled preempt or irq, or hotplug before we use it...
>>>
>>> Like such issue cases:
>>>                               get x from policy->cpus
>>>       DOWN notifier
>>>       change policy->cpus
>>>       do offline x
>>>                               send ipi to x
>>>
>>> Will that happen?
> 
> Sorry I am not sure. :(
> 
> I can see mutex being used in cpufreq_governor.c which should take care
> of race conditions...
> 
>> May be we could do some test to confirm it?
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 443442d..449be88 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/tick.h>
>>  #include <linux/types.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/cpu.h>
>>
>>  #include "cpufreq_governor.h"
>>
>> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
>> struct cpufreq_policy *policy,
>>         if (!all_cpus) {
>>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>>         } else {
>> +               get_online_cpus();
>>                 for_each_cpu(i, policy->cpus)
>>                         __gov_queue_work(i, dbs_data, delay);
>> +               put_online_cpus();
>>         }
>>  }
>>  EXPORT_SYMBOL_GPL(gov_queue_work);
>>
>> This is supposed to make WARN disappear, if it works, then BINGO :)
> 
> Let people test it and then we can talk :)

Agree :)

Borislav, would you like to take a try?

If this fix cause other troubles, you could try get_cpu() or disable irq
also.

Regards,
Michael Wang

> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  8:10                 ` Frederic Weisbecker
@ 2013-05-20  9:31                   ` Srivatsa S. Bhat
  2013-05-20  9:40                     ` Viresh Kumar
  0 siblings, 1 reply; 54+ messages in thread
From: Srivatsa S. Bhat @ 2013-05-20  9:31 UTC (permalink / raw)
  To: Frederic Weisbecker, Borislav Petkov, Viresh Kumar
  Cc: Michael Wang, Paul E. McKenney, Jiri Kosina, Tony Luck,
	linux-kernel, x86, Thomas Gleixner, Rafael J. Wysocki,
	Linux PM mailing list, Tejun Heo

On 05/20/2013 01:40 PM, Frederic Weisbecker wrote:
> 2013/5/20 Borislav Petkov <bp@alien8.de>:
>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>>> I suppose the reason is that the cpu we passed to
>>> mod_delayed_work_on() has a chance to become offline before we
>>> disabled irq, what about check it before send resched ipi? like:
>>
>> I think this is only addressing the symptoms - what we should be doing
>> instead is asking ourselves why are we even scheduling work on a cpu if
>> the machine goes offline?
>>
>> I don't know though who should be responsible for killing all that
>> work - the workqueue itself or the guy who created it, i.e. cpufreq
>> governor...
>>
>> Hmmm.
> 
> Let's look at this portion of cpu_down():
> 
> 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
> 	if (err) {
> 		/* CPU didn't die: tell everyone.  Can't complain. */
> 		smpboot_unpark_threads(cpu);
> 		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> 		goto out_release;
> 	}
> 	BUG_ON(cpu_online(cpu));
> 
> 	/*
> 	 * The migration_call() CPU_DYING callback will have removed all
> 	 * runnable tasks from the cpu, there's only the idle task left now
> 	 * that the migration thread is done doing the stop_machine thing.
> 	 *
> 	 * Wait for the stop thread to go away.
> 	 */
> 	while (!idle_cpu(cpu))
> 		cpu_relax();
> 	/* This actually kills the CPU. */
> 	__cpu_die(cpu);
> 
> 	/* CPU is completely dead: tell everyone.  Too late to complain. */
> 	cpu_notify_nofail(CPU_DEAD | mod, hcpu);
> 
> 	check_for_tasks(cpu);
> 
> The CPU is considered offline after the take_cpu_down stop machine job
> completes. But the struct timer_list timers are migrated later through
> CPU_DEAD notification. Only once that's completed we check for illegal
> residual tasks in the CPU.  So there is a little window between the
> stop machine thing and __cpu_die() where a timer can fire with
> cpu_online(cpu) == 1.
> 

Nope, the dying CPU is removed from the cpu_online_mask in the very first
stages of stop_machine(), specifically in the __cpu_disable() function.
__cpu_die() is just a dummy.

> Now concerning the workqueue I don't know. I guess the per cpu ones
> are not migrated due to their affinity. Apparently they can still wake
> up and execute works due to the timers...

The interesting thing is that the cpufreq governor actually _cancels_ the
queued work in CPU_DOWN_PREPARE stage, as far as I understand.

cpufreq_cpu_callback()
  -> __cpufreq_remove_dev()
    -> __cpufreq_governor(data, CPUFREQ_GOV_STOP);
      -> od_cpufreq_governor_dbs()
	 -> cpufreq_governor_dbs(), which has the following case statement:


        case CPUFREQ_GOV_STOP:
                if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
                        cs_dbs_info->enable = 0;

                gov_cancel_work(dbs_data, policy);

                mutex_lock(&dbs_data->mutex);
                mutex_destroy(&cpu_cdbs->timer_mutex);

                mutex_unlock(&dbs_data->mutex);

                break;


But recently I removed the call to __cpufreq_remove_dev() in the suspend/resume
path (tasks frozen), in commit a66b2e503 (cpufreq: Preserve sysfs files across
suspend/resume). So I'm curious to know if this is affecting in any way.

So Boris, do you see the warnings during regular hotplug also (via sysfs) or
only during suspend/shutdown? [Actually shutdown doesn't freeze tasks, so that is
already a hint that this warning can be triggered via sysfs also, but it would
be good to get a confirmation.]

And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is
supposed to kill any pending workqueue functions pertaining to offline CPUs
right? Could there be a synchronization bug somewhere due to which this
might not be happening properly?

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  9:31                   ` Srivatsa S. Bhat
@ 2013-05-20  9:40                     ` Viresh Kumar
  2013-05-20 10:24                       ` Viresh Kumar
  0 siblings, 1 reply; 54+ messages in thread
From: Viresh Kumar @ 2013-05-20  9:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Frederic Weisbecker, Borislav Petkov, Michael Wang,
	Paul E. McKenney, Jiri Kosina, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, Rafael J. Wysocki, Linux PM mailing list,
	Tejun Heo

On 20 May 2013 15:01, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is
> supposed to kill any pending workqueue functions pertaining to offline CPUs
> right?

Yes.. It will cancel work for all cpus first and will start again for
online cpus again.

> Could there be a synchronization bug somewhere due to which this
> might not be happening properly?

Not sure.. I have seen mutex's are used well in cpufreq_governor.. don't know
if I missed something.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  9:40                     ` Viresh Kumar
@ 2013-05-20 10:24                       ` Viresh Kumar
  0 siblings, 0 replies; 54+ messages in thread
From: Viresh Kumar @ 2013-05-20 10:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Srivatsa S. Bhat, Frederic Weisbecker, Michael Wang,
	Paul E. McKenney, Jiri Kosina, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, Rafael J. Wysocki, Linux PM mailing list,
	Tejun Heo

On 20 May 2013 15:10, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 20 May 2013 15:01, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is
>> supposed to kill any pending workqueue functions pertaining to offline CPUs
>> right?
>
> Yes.. It will cancel work for all cpus first and will start again for
> online cpus again.
>
>> Could there be a synchronization bug somewhere due to which this
>> might not be happening properly?
>
> Not sure.. I have seen mutex's are used well in cpufreq_governor.. don't know
> if I missed something.

Borislav,

Can you try below change to see if the issue is still present? (Untested)

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 7532570..b9ae5f6 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -359,7 +359,9 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
                if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
                        cs_dbs_info->enable = 0;

+               mutex_lock(&cpu_cdbs->timer_mutex);
                gov_cancel_work(dbs_data, policy);
+               mutex_unlock(&cpu_cdbs->timer_mutex);

                mutex_lock(&dbs_data->mutex);
                mutex_destroy(&cpu_cdbs->timer_mutex);

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20  9:24                                 ` Michael Wang
@ 2013-05-20 13:23                                   ` Borislav Petkov
  2013-05-20 13:43                                     ` Viresh Kumar
  2013-05-21  2:20                                     ` Michael Wang
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-20 13:23 UTC (permalink / raw)
  To: Michael Wang
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote:
> >> diff --git a/drivers/cpufreq/cpufreq_governor.c
> >> b/drivers/cpufreq/cpufreq_governor.c
> >> index 443442d..449be88 100644
> >> --- a/drivers/cpufreq/cpufreq_governor.c
> >> +++ b/drivers/cpufreq/cpufreq_governor.c
> >> @@ -26,6 +26,7 @@
> >>  #include <linux/tick.h>
> >>  #include <linux/types.h>
> >>  #include <linux/workqueue.h>
> >> +#include <linux/cpu.h>
> >>
> >>  #include "cpufreq_governor.h"
> >>
> >> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
> >> struct cpufreq_policy *policy,
> >>         if (!all_cpus) {
> >>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
> >>         } else {
> >> +               get_online_cpus();
> >>                 for_each_cpu(i, policy->cpus)
> >>                         __gov_queue_work(i, dbs_data, delay);
> >> +               put_online_cpus();
> >>         }
> >>  }
> >>  EXPORT_SYMBOL_GPL(gov_queue_work);
> >>
> >> This is supposed to make WARN disappear, if it works, then BINGO :)
> > 
> > Let people test it and then we can talk :)
> 
> Agree :)
> 
> Borislav, would you like to take a try?
> 
> If this fix cause other troubles, you could try get_cpu() or disable irq
> also.

I just confirmed that policy->cpus contains offlined cores with this:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 5af40ad82d23..e8c25f71e9b6 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -169,6 +169,9 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
 {
        struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 
+       if (WARN_ON(!cpu_online(cpu)))
+               return;
+
        mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
 }

see splats collection below.

And I don't think your fix above addresses the issue for the simple
reason that if cpus go offline *before* you do get_online_cpus(), then
policy->cpus will already contain offlined cpus.

Rather, a better fix would be, IMHO, to do this (it works here, of course):

---
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 5af40ad82d23..58541b164494 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <asm/cputime.h>
+#include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/cpumask.h>
 #include <linux/export.h>
@@ -169,7 +170,15 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
 {
        struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 
+       get_online_cpus();
+
+       if (!cpu_online(cpu))
+               goto out;
+
        mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
+
+ out:
+       put_online_cpus();
 }
 
 void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
--


[   94.386340] EXT4-fs (sda7): re-mounted. Opts: (null)
[   96.520362] kvm: exiting hardware virtualization
[   96.637687] ACPI: Preparing to enter system sleep state S5
[   96.643506] Disabling non-boot CPUs ...
[   96.855499] ------------[ cut here ]------------
[   96.860172] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   96.868501] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   96.914238] CPU: 0 PID: 315 Comm: kworker/1:2 Tainted: G        W    3.10.0-rc1+ #2
[   96.921969] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   96.929424] Workqueue: events od_dbs_timer
[   96.933574]  0000000000000009 ffff88043a08bc78 ffffffff8161445c ffff88043a08bcb8
[   96.941085]  ffffffff8103e540 ffff88043b712a80 0000000000000001 ffff88043a296400
[   96.948602]  ffff88043b712a80 ffffffff81cdc910 0000000000000001 ffff88043a08bcc8
[   96.956123] Call Trace:
[   96.958602]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   96.963801]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   96.969858]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   96.975735]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   96.981359]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   96.986808]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   96.992691]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   96.998756]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   97.004371]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   97.010256]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   97.015185]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.021605]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   97.027049]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.033457] ---[ end trace d36d91c626ac81a0 ]---
[   97.039221] ------------[ cut here ]------------
[   97.039227] ------------[ cut here ]------------
[   97.039229] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   97.039243] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   97.039245] CPU: 4 PID: 82 Comm: kworker/2:1 Tainted: G        W    3.10.0-rc1+ #2
[   97.039245] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   97.039248] Workqueue: events od_dbs_timer
[   97.039250]  0000000000000009 ffff88043b5cfc78 ffffffff8161445c ffff88043b5cfcb8
[   97.039251]  ffffffff8103e540 ffff88043b712a80 0000000000000002 ffff88043a295e00
[   97.039253]  ffff88043b712a80 ffffffff81cdc910 0000000000000002 ffff88043b5cfcc8
[   97.039253] Call Trace:
[   97.039255]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   97.039257]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   97.039258]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   97.039259]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   97.039261]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   97.039263]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   97.039264]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   97.039265]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   97.039267]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   97.039268]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   97.039269]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.039270]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   97.039272]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.039272] ---[ end trace d36d91c626ac81a1 ]---
[   97.143214] nouveau E[     DRM] GPU lockup - switching to software fbcon
[   97.318430] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   97.326804] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   97.374578] CPU: 0 PID: 98 Comm: kworker/3:1 Tainted: G        W    3.10.0-rc1+ #2
[   97.384154] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   97.393566] Workqueue: events od_dbs_timer
[   97.399675]  0000000000000009 ffff88043b179c78 ffffffff8161445c ffff88043b179cb8
[   97.409153]  ffffffff8103e540 ffff88043b712a80 0000000000000003 ffff88043a295a00
[   97.418623]  ffff88043b712a80 ffffffff81cdc910 0000000000000003 ffff88043b179cc8
[   97.428103] Call Trace:
[   97.432520]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   97.439678]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   97.447694]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   97.455512]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   97.462993]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   97.470259]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   97.477878]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   97.485652]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   97.492969]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   97.500565]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   97.507167]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.515255]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   97.522389]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.530472] ---[ end trace d36d91c626ac81a2 ]---
[   97.543176] ------------[ cut here ]------------
[   97.547172] ------------[ cut here ]------------
[   97.547178] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   97.547197] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   97.547199] CPU: 7 PID: 316 Comm: kworker/5:1 Tainted: G        W    3.10.0-rc1+ #2
[   97.547200] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   97.547202] Workqueue: events od_dbs_timer
[   97.547204]  0000000000000009 ffff88043905dc78 ffffffff8161445c ffff88043905dcb8
[   97.547205]  ffffffff8103e540 ffff88043b712a80 0000000000000005 ffff88043a295800
[   97.547206]  ffff88043b712a80 ffffffff81cdc910 0000000000000005 ffff88043905dcc8
[   97.547207] Call Trace:
[   97.547211]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   97.547214]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   97.547215]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   97.547216]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   97.547218]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   97.547220]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   97.547221]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   97.547222]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   97.547224]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   97.547225]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   97.547226]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.547228]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   97.547229]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.547230] ---[ end trace d36d91c626ac81a3 ]---
[   97.761326] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   97.770798] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   97.819617] CPU: 0 PID: 253 Comm: kworker/4:1 Tainted: G        W    3.10.0-rc1+ #2
[   97.828623] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   97.837372] Workqueue: events od_dbs_timer
[   97.842805]  0000000000000009 ffff880439529c78 ffffffff8161445c ffff880439529cb8
[   97.851628]  ffffffff8103e540 ffff88043b712a80 0000000000000004 ffff88043a295c00
[   97.860445]  ffff88043b712a80 ffffffff81cdc910 0000000000000004 ffff880439529cc8
[   97.869249] Call Trace:
[   97.873041]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   97.879533]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   97.886912]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   97.894100]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   97.901002]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   97.907706]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   97.914797]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   97.922016]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   97.928803]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   97.935837]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   97.941900]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.949443]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   97.956027]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   97.963563] ---[ end trace d36d91c626ac81a4 ]---
[   97.970449] ------------[ cut here ]------------
[   97.976277] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   97.985762] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   98.035051] CPU: 0 PID: 102 Comm: kworker/6:1 Tainted: G        W    3.10.0-rc1+ #2
[   98.044067] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   98.052834] Workqueue: events od_dbs_timer
[   98.058285]  0000000000000009 ffff88043b6f3c78 ffffffff8161445c ffff88043b6f3cb8
[   98.067114]  ffffffff8103e540 ffff88043b712a80 0000000000000006 ffff88043a295600
[   98.075924]  ffff88043b712a80 ffffffff81cdc910 0000000000000006 ffff88043b6f3cc8
[   98.084735] Call Trace:
[   98.088518]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   98.095024]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   98.102386]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   98.109565]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   98.116502]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   98.123253]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   98.130394]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   98.137667]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   98.144456]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   98.151510]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   98.157583]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   98.165143]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   98.171730]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   98.179282] ---[ end trace d36d91c626ac81a5 ]---
[   98.185098] ------------[ cut here ]------------
[   98.190903] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
[   98.200387] Modules linked in: ext2 vfat fat loop
[   98.205029] nouveau W[   PFIFO][0000:03:00.0] unknown intr 0x00400000, ch 1
[   98.214563]  usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
[   98.258886] CPU: 0 PID: 318 Comm: kworker/7:1 Tainted: G        W    3.10.0-rc1+ #2
[   98.267919] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
[   98.276689] Workqueue: events od_dbs_timer
[   98.282147]  0000000000000009 ffff88043969dc78 ffffffff8161445c ffff88043969dcb8
[   98.290991]  ffffffff8103e540 ffff88043b712a80 0000000000000007 ffff88043a295200
[   98.299832]  ffff88043b712a80 ffffffff81cdc910 0000000000000007 ffff88043969dcc8
[   98.308671] Call Trace:
[   98.312471]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
[   98.318982]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
[   98.326376]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
[   98.333577]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
[   98.340482]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
[   98.347160]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
[   98.354232]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
[   98.361471]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
[   98.368260]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
[   98.375309]  [<ffffffff8106634a>] kthread+0xea/0xf0
[   98.381385]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   98.388951]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
[   98.395546]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
[   98.403097] ---[ end trace d36d91c626ac81a6 ]---
[   98.409180] Power down.
[   98.413109] acpi_power_off called

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20 13:23                                   ` Borislav Petkov
@ 2013-05-20 13:43                                     ` Viresh Kumar
  2013-05-20 15:08                                       ` Borislav Petkov
  2013-05-21  2:20                                     ` Michael Wang
  1 sibling, 1 reply; 54+ messages in thread
From: Viresh Kumar @ 2013-05-20 13:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Wang, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 20 May 2013 18:53, Borislav Petkov <bp@alien8.de> wrote:
> I just confirmed that policy->cpus contains offlined cores with this:
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..e8c25f71e9b6 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -169,6 +169,9 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
>  {
>         struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>
> +       if (WARN_ON(!cpu_online(cpu)))
> +               return;
> +
>         mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
>  }

Hmm, so for sure there is some locking issue there.
Have you tried my patch? I am not sure if it will fix everything but may
fix it.

> see splats collection below.
>
> And I don't think your fix above addresses the issue for the simple
> reason that if cpus go offline *before* you do get_online_cpus(), then
> policy->cpus will already contain offlined cpus.
>
> Rather, a better fix would be, IMHO, to do this (it works here, of course):
>
> ---
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..58541b164494 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -17,6 +17,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
>  #include <asm/cputime.h>
> +#include <linux/cpu.h>
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
>  #include <linux/export.h>
> @@ -169,7 +170,15 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
>  {
>         struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>
> +       get_online_cpus();
> +
> +       if (!cpu_online(cpu))
> +               goto out;
> +
>         mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
> +
> + out:
> +       put_online_cpus();
>  }
>
>  void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,

This looks fine, but I want to fix the locking rather than just
hiding the issue. :)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20 13:43                                     ` Viresh Kumar
@ 2013-05-20 15:08                                       ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-05-20 15:08 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Michael Wang, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On Mon, May 20, 2013 at 07:13:08PM +0530, Viresh Kumar wrote:
> Hmm, so for sure there is some locking issue there. ave you tried my
> Hpatch?

No, not yet. Pretty busy ATM. Btw, you could try reproducing it too, in
the meantime - simply enable

CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_NO_HZ=y
CONFIG_RCU_FAST_NO_HZ=y

and halt the box. I don't know whether !x86 boxen support NO_HZ_FULL yet
though.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-20 13:23                                   ` Borislav Petkov
  2013-05-20 13:43                                     ` Viresh Kumar
@ 2013-05-21  2:20                                     ` Michael Wang
  2013-05-21  2:37                                       ` Michael Wang
  2013-05-21  7:21                                       ` Borislav Petkov
  1 sibling, 2 replies; 54+ messages in thread
From: Michael Wang @ 2013-05-21  2:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 05/20/2013 09:23 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote:
>>>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>>>> b/drivers/cpufreq/cpufreq_governor.c
>>>> index 443442d..449be88 100644
>>>> --- a/drivers/cpufreq/cpufreq_governor.c
>>>> +++ b/drivers/cpufreq/cpufreq_governor.c
>>>> @@ -26,6 +26,7 @@
>>>>  #include <linux/tick.h>
>>>>  #include <linux/types.h>
>>>>  #include <linux/workqueue.h>
>>>> +#include <linux/cpu.h>
>>>>
>>>>  #include "cpufreq_governor.h"
>>>>
>>>> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
>>>> struct cpufreq_policy *policy,
>>>>         if (!all_cpus) {
>>>>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>>>>         } else {
>>>> +               get_online_cpus();
>>>>                 for_each_cpu(i, policy->cpus)
>>>>                         __gov_queue_work(i, dbs_data, delay);
>>>> +               put_online_cpus();
>>>>         }
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(gov_queue_work);
>>>>
>>>> This is supposed to make WARN disappear, if it works, then BINGO :)
>>>
>>> Let people test it and then we can talk :)
>>
>> Agree :)
>>
>> Borislav, would you like to take a try?
>>
>> If this fix cause other troubles, you could try get_cpu() or disable irq
>> also.
> 
> I just confirmed that policy->cpus contains offlined cores with this:
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..e8c25f71e9b6 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -169,6 +169,9 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
>  {
>         struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> 
> +       if (WARN_ON(!cpu_online(cpu)))
> +               return;
> +

This is not enough to prove that policy->cpus is wrong, the cpu could be
online when get from policy->cpus, but offline when checked here, since
hotplug is able to happen during the period.

>         mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
>  }
> 
> see splats collection below.
> 
> And I don't think your fix above addresses the issue for the simple
> reason that if cpus go offline *before* you do get_online_cpus(), then
> policy->cpus will already contain offlined cpus.

I don't get it...

get_online_cpus() is just stop hotplug happen after it was invoked, so
unless policy->cpus is really wrong, otherwise all the cpu it masked
won't go offline any more.

> 
> Rather, a better fix would be, IMHO, to do this (it works here, of course):
> 
> ---
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..58541b164494 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -17,6 +17,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> 
>  #include <asm/cputime.h>
> +#include <linux/cpu.h>
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
>  #include <linux/export.h>
> @@ -169,7 +170,15 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
>  {
>         struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> 
> +       get_online_cpus();

This protect nothing...before we go here, the cpu could already offline,
nothing changed...

If you really want to confirm the policy->cpus was wrong, the way should
be apply the fix I suggested, than check online in here.

If hotplug could not happen but still get an offline cpu from
policy->cpus, than we could say it's wrong, otherwise we proved nothing...

Regards,
Michael Wang

> +
> +       if (!cpu_online(cpu))
> +               goto out;
> +
>         mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
> +
> + out:
> +       put_online_cpus();
>  }
> 
>  void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
> --
> 
> 
> [   94.386340] EXT4-fs (sda7): re-mounted. Opts: (null)
> [   96.520362] kvm: exiting hardware virtualization
> [   96.637687] ACPI: Preparing to enter system sleep state S5
> [   96.643506] Disabling non-boot CPUs ...
> [   96.855499] ------------[ cut here ]------------
> [   96.860172] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   96.868501] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   96.914238] CPU: 0 PID: 315 Comm: kworker/1:2 Tainted: G        W    3.10.0-rc1+ #2
> [   96.921969] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   96.929424] Workqueue: events od_dbs_timer
> [   96.933574]  0000000000000009 ffff88043a08bc78 ffffffff8161445c ffff88043a08bcb8
> [   96.941085]  ffffffff8103e540 ffff88043b712a80 0000000000000001 ffff88043a296400
> [   96.948602]  ffff88043b712a80 ffffffff81cdc910 0000000000000001 ffff88043a08bcc8
> [   96.956123] Call Trace:
> [   96.958602]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   96.963801]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   96.969858]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   96.975735]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   96.981359]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   96.986808]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   96.992691]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   96.998756]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   97.004371]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   97.010256]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   97.015185]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.021605]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   97.027049]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.033457] ---[ end trace d36d91c626ac81a0 ]---
> [   97.039221] ------------[ cut here ]------------
> [   97.039227] ------------[ cut here ]------------
> [   97.039229] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   97.039243] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   97.039245] CPU: 4 PID: 82 Comm: kworker/2:1 Tainted: G        W    3.10.0-rc1+ #2
> [   97.039245] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   97.039248] Workqueue: events od_dbs_timer
> [   97.039250]  0000000000000009 ffff88043b5cfc78 ffffffff8161445c ffff88043b5cfcb8
> [   97.039251]  ffffffff8103e540 ffff88043b712a80 0000000000000002 ffff88043a295e00
> [   97.039253]  ffff88043b712a80 ffffffff81cdc910 0000000000000002 ffff88043b5cfcc8
> [   97.039253] Call Trace:
> [   97.039255]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   97.039257]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   97.039258]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   97.039259]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   97.039261]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   97.039263]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   97.039264]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   97.039265]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   97.039267]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   97.039268]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   97.039269]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.039270]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   97.039272]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.039272] ---[ end trace d36d91c626ac81a1 ]---
> [   97.143214] nouveau E[     DRM] GPU lockup - switching to software fbcon
> [   97.318430] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   97.326804] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   97.374578] CPU: 0 PID: 98 Comm: kworker/3:1 Tainted: G        W    3.10.0-rc1+ #2
> [   97.384154] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   97.393566] Workqueue: events od_dbs_timer
> [   97.399675]  0000000000000009 ffff88043b179c78 ffffffff8161445c ffff88043b179cb8
> [   97.409153]  ffffffff8103e540 ffff88043b712a80 0000000000000003 ffff88043a295a00
> [   97.418623]  ffff88043b712a80 ffffffff81cdc910 0000000000000003 ffff88043b179cc8
> [   97.428103] Call Trace:
> [   97.432520]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   97.439678]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   97.447694]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   97.455512]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   97.462993]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   97.470259]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   97.477878]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   97.485652]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   97.492969]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   97.500565]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   97.507167]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.515255]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   97.522389]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.530472] ---[ end trace d36d91c626ac81a2 ]---
> [   97.543176] ------------[ cut here ]------------
> [   97.547172] ------------[ cut here ]------------
> [   97.547178] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   97.547197] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   97.547199] CPU: 7 PID: 316 Comm: kworker/5:1 Tainted: G        W    3.10.0-rc1+ #2
> [   97.547200] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   97.547202] Workqueue: events od_dbs_timer
> [   97.547204]  0000000000000009 ffff88043905dc78 ffffffff8161445c ffff88043905dcb8
> [   97.547205]  ffffffff8103e540 ffff88043b712a80 0000000000000005 ffff88043a295800
> [   97.547206]  ffff88043b712a80 ffffffff81cdc910 0000000000000005 ffff88043905dcc8
> [   97.547207] Call Trace:
> [   97.547211]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   97.547214]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   97.547215]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   97.547216]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   97.547218]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   97.547220]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   97.547221]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   97.547222]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   97.547224]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   97.547225]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   97.547226]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.547228]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   97.547229]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.547230] ---[ end trace d36d91c626ac81a3 ]---
> [   97.761326] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   97.770798] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   97.819617] CPU: 0 PID: 253 Comm: kworker/4:1 Tainted: G        W    3.10.0-rc1+ #2
> [   97.828623] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   97.837372] Workqueue: events od_dbs_timer
> [   97.842805]  0000000000000009 ffff880439529c78 ffffffff8161445c ffff880439529cb8
> [   97.851628]  ffffffff8103e540 ffff88043b712a80 0000000000000004 ffff88043a295c00
> [   97.860445]  ffff88043b712a80 ffffffff81cdc910 0000000000000004 ffff880439529cc8
> [   97.869249] Call Trace:
> [   97.873041]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   97.879533]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   97.886912]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   97.894100]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   97.901002]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   97.907706]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   97.914797]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   97.922016]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   97.928803]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   97.935837]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   97.941900]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.949443]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   97.956027]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   97.963563] ---[ end trace d36d91c626ac81a4 ]---
> [   97.970449] ------------[ cut here ]------------
> [   97.976277] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   97.985762] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   98.035051] CPU: 0 PID: 102 Comm: kworker/6:1 Tainted: G        W    3.10.0-rc1+ #2
> [   98.044067] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   98.052834] Workqueue: events od_dbs_timer
> [   98.058285]  0000000000000009 ffff88043b6f3c78 ffffffff8161445c ffff88043b6f3cb8
> [   98.067114]  ffffffff8103e540 ffff88043b712a80 0000000000000006 ffff88043a295600
> [   98.075924]  ffff88043b712a80 ffffffff81cdc910 0000000000000006 ffff88043b6f3cc8
> [   98.084735] Call Trace:
> [   98.088518]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   98.095024]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   98.102386]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   98.109565]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   98.116502]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   98.123253]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   98.130394]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   98.137667]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   98.144456]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   98.151510]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   98.157583]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   98.165143]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   98.171730]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   98.179282] ---[ end trace d36d91c626ac81a5 ]---
> [   98.185098] ------------[ cut here ]------------
> [   98.190903] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
> [   98.200387] Modules linked in: ext2 vfat fat loop
> [   98.205029] nouveau W[   PFIFO][0000:03:00.0] unknown intr 0x00400000, ch 1
> [   98.214563]  usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
> [   98.258886] CPU: 0 PID: 318 Comm: kworker/7:1 Tainted: G        W    3.10.0-rc1+ #2
> [   98.267919] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   98.276689] Workqueue: events od_dbs_timer
> [   98.282147]  0000000000000009 ffff88043969dc78 ffffffff8161445c ffff88043969dcb8
> [   98.290991]  ffffffff8103e540 ffff88043b712a80 0000000000000007 ffff88043a295200
> [   98.299832]  ffff88043b712a80 ffffffff81cdc910 0000000000000007 ffff88043969dcc8
> [   98.308671] Call Trace:
> [   98.312471]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
> [   98.318982]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   98.326376]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   98.333577]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
> [   98.340482]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
> [   98.347160]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   98.354232]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   98.361471]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   98.368260]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   98.375309]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   98.381385]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   98.388951]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
> [   98.395546]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   98.403097] ---[ end trace d36d91c626ac81a6 ]---
> [   98.409180] Power down.
> [   98.413109] acpi_power_off called
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-21  2:20                                     ` Michael Wang
@ 2013-05-21  2:37                                       ` Michael Wang
  2013-05-21  7:21                                       ` Borislav Petkov
  1 sibling, 0 replies; 54+ messages in thread
From: Michael Wang @ 2013-05-21  2:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 05/21/2013 10:20 AM, Michael Wang wrote:
[snip]
> 
> If hotplug could not happen but still get an offline cpu from
> policy->cpus, than we could say it's wrong, otherwise we proved nothing...

like this:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 443442d..8ed8b35 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -26,6 +26,7 @@
 #include <linux/tick.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
+#include <linux/cpu.h>
 
 #include "cpufreq_governor.h"
 
@@ -169,6 +170,9 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
 {
        struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 
+       if (WARN_ON(!cpu_online(cpu)))
+               return;
+
        mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
 }
 
@@ -180,8 +184,10 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
        if (!all_cpus) {
                __gov_queue_work(smp_processor_id(), dbs_data, delay);
        } else {
+               get_online_cpus();
                for_each_cpu(i, policy->cpus)
                        __gov_queue_work(i, dbs_data, delay);
+               put_online_cpus();
        }
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);



Would you like to try it and see whether we trigger any WARN?

If still trigger WARN, then we could make sure the problem is
the wrong 'policy->cpus' ;-)

Regards,
Michael Wang

> 
> Regards,
> Michael Wang
> 
>> +
>> +       if (!cpu_online(cpu))
>> +               goto out;
>> +
>>         mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
>> +
>> + out:
>> +       put_online_cpus();
>>  }
>>
>>  void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
>> --
>>
>>
>> [   94.386340] EXT4-fs (sda7): re-mounted. Opts: (null)
>> [   96.520362] kvm: exiting hardware virtualization
>> [   96.637687] ACPI: Preparing to enter system sleep state S5
>> [   96.643506] Disabling non-boot CPUs ...
>> [   96.855499] ------------[ cut here ]------------
>> [   96.860172] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   96.868501] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   96.914238] CPU: 0 PID: 315 Comm: kworker/1:2 Tainted: G        W    3.10.0-rc1+ #2
>> [   96.921969] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   96.929424] Workqueue: events od_dbs_timer
>> [   96.933574]  0000000000000009 ffff88043a08bc78 ffffffff8161445c ffff88043a08bcb8
>> [   96.941085]  ffffffff8103e540 ffff88043b712a80 0000000000000001 ffff88043a296400
>> [   96.948602]  ffff88043b712a80 ffffffff81cdc910 0000000000000001 ffff88043a08bcc8
>> [   96.956123] Call Trace:
>> [   96.958602]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   96.963801]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   96.969858]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   96.975735]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   96.981359]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   96.986808]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   96.992691]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   96.998756]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   97.004371]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   97.010256]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   97.015185]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.021605]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   97.027049]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.033457] ---[ end trace d36d91c626ac81a0 ]---
>> [   97.039221] ------------[ cut here ]------------
>> [   97.039227] ------------[ cut here ]------------
>> [   97.039229] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   97.039243] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   97.039245] CPU: 4 PID: 82 Comm: kworker/2:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   97.039245] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   97.039248] Workqueue: events od_dbs_timer
>> [   97.039250]  0000000000000009 ffff88043b5cfc78 ffffffff8161445c ffff88043b5cfcb8
>> [   97.039251]  ffffffff8103e540 ffff88043b712a80 0000000000000002 ffff88043a295e00
>> [   97.039253]  ffff88043b712a80 ffffffff81cdc910 0000000000000002 ffff88043b5cfcc8
>> [   97.039253] Call Trace:
>> [   97.039255]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   97.039257]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   97.039258]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   97.039259]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   97.039261]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   97.039263]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   97.039264]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   97.039265]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   97.039267]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   97.039268]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   97.039269]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.039270]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   97.039272]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.039272] ---[ end trace d36d91c626ac81a1 ]---
>> [   97.143214] nouveau E[     DRM] GPU lockup - switching to software fbcon
>> [   97.318430] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   97.326804] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   97.374578] CPU: 0 PID: 98 Comm: kworker/3:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   97.384154] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   97.393566] Workqueue: events od_dbs_timer
>> [   97.399675]  0000000000000009 ffff88043b179c78 ffffffff8161445c ffff88043b179cb8
>> [   97.409153]  ffffffff8103e540 ffff88043b712a80 0000000000000003 ffff88043a295a00
>> [   97.418623]  ffff88043b712a80 ffffffff81cdc910 0000000000000003 ffff88043b179cc8
>> [   97.428103] Call Trace:
>> [   97.432520]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   97.439678]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   97.447694]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   97.455512]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   97.462993]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   97.470259]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   97.477878]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   97.485652]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   97.492969]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   97.500565]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   97.507167]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.515255]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   97.522389]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.530472] ---[ end trace d36d91c626ac81a2 ]---
>> [   97.543176] ------------[ cut here ]------------
>> [   97.547172] ------------[ cut here ]------------
>> [   97.547178] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   97.547197] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   97.547199] CPU: 7 PID: 316 Comm: kworker/5:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   97.547200] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   97.547202] Workqueue: events od_dbs_timer
>> [   97.547204]  0000000000000009 ffff88043905dc78 ffffffff8161445c ffff88043905dcb8
>> [   97.547205]  ffffffff8103e540 ffff88043b712a80 0000000000000005 ffff88043a295800
>> [   97.547206]  ffff88043b712a80 ffffffff81cdc910 0000000000000005 ffff88043905dcc8
>> [   97.547207] Call Trace:
>> [   97.547211]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   97.547214]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   97.547215]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   97.547216]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   97.547218]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   97.547220]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   97.547221]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   97.547222]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   97.547224]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   97.547225]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   97.547226]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.547228]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   97.547229]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.547230] ---[ end trace d36d91c626ac81a3 ]---
>> [   97.761326] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   97.770798] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   97.819617] CPU: 0 PID: 253 Comm: kworker/4:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   97.828623] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   97.837372] Workqueue: events od_dbs_timer
>> [   97.842805]  0000000000000009 ffff880439529c78 ffffffff8161445c ffff880439529cb8
>> [   97.851628]  ffffffff8103e540 ffff88043b712a80 0000000000000004 ffff88043a295c00
>> [   97.860445]  ffff88043b712a80 ffffffff81cdc910 0000000000000004 ffff880439529cc8
>> [   97.869249] Call Trace:
>> [   97.873041]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   97.879533]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   97.886912]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   97.894100]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   97.901002]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   97.907706]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   97.914797]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   97.922016]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   97.928803]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   97.935837]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   97.941900]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.949443]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   97.956027]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   97.963563] ---[ end trace d36d91c626ac81a4 ]---
>> [   97.970449] ------------[ cut here ]------------
>> [   97.976277] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   97.985762] Modules linked in: ext2 vfat fat loop usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   98.035051] CPU: 0 PID: 102 Comm: kworker/6:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   98.044067] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   98.052834] Workqueue: events od_dbs_timer
>> [   98.058285]  0000000000000009 ffff88043b6f3c78 ffffffff8161445c ffff88043b6f3cb8
>> [   98.067114]  ffffffff8103e540 ffff88043b712a80 0000000000000006 ffff88043a295600
>> [   98.075924]  ffff88043b712a80 ffffffff81cdc910 0000000000000006 ffff88043b6f3cc8
>> [   98.084735] Call Trace:
>> [   98.088518]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   98.095024]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   98.102386]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   98.109565]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   98.116502]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   98.123253]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   98.130394]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   98.137667]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   98.144456]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   98.151510]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   98.157583]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   98.165143]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   98.171730]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   98.179282] ---[ end trace d36d91c626ac81a5 ]---
>> [   98.185098] ------------[ cut here ]------------
>> [   98.190903] WARNING: at drivers/cpufreq/cpufreq_governor.c:172 gov_queue_work+0xf0/0x110()
>> [   98.200387] Modules linked in: ext2 vfat fat loop
>> [   98.205029] nouveau W[   PFIFO][0000:03:00.0] unknown intr 0x00400000, ch 1
>> [   98.214563]  usbhid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_hda_codec ehci_pci xhci_hcd ehci_hcd usbcore crc32_pclmul crc32c_intel snd_hwdep snd_pcm snd_page_alloc snd_timer ghash_clmulni_intel snd aesni_intel aes_x86_64 glue_helper sb_edac edac_core acpi_cpufreq mperf pcspkr lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support evdev soundcore lpc_ich mfd_core processor dcdbas i2c_i801 usb_common button microcode
>> [   98.258886] CPU: 0 PID: 318 Comm: kworker/7:1 Tainted: G        W    3.10.0-rc1+ #2
>> [   98.267919] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
>> [   98.276689] Workqueue: events od_dbs_timer
>> [   98.282147]  0000000000000009 ffff88043969dc78 ffffffff8161445c ffff88043969dcb8
>> [   98.290991]  ffffffff8103e540 ffff88043b712a80 0000000000000007 ffff88043a295200
>> [   98.299832]  ffff88043b712a80 ffffffff81cdc910 0000000000000007 ffff88043969dcc8
>> [   98.308671] Call Trace:
>> [   98.312471]  [<ffffffff8161445c>] dump_stack+0x19/0x1b
>> [   98.318982]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
>> [   98.326376]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
>> [   98.333577]  [<ffffffff814f6bf0>] gov_queue_work+0xf0/0x110
>> [   98.340482]  [<ffffffff814f60bb>] od_dbs_timer+0xcb/0x170
>> [   98.347160]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
>> [   98.354232]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
>> [   98.361471]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
>> [   98.368260]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
>> [   98.375309]  [<ffffffff8106634a>] kthread+0xea/0xf0
>> [   98.381385]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   98.388951]  [<ffffffff81623d9c>] ret_from_fork+0x7c/0xb0
>> [   98.395546]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
>> [   98.403097] ---[ end trace d36d91c626ac81a6 ]---
>> [   98.409180] Power down.
>> [   98.413109] acpi_power_off called
>>
> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-21  2:20                                     ` Michael Wang
  2013-05-21  2:37                                       ` Michael Wang
@ 2013-05-21  7:21                                       ` Borislav Petkov
  2013-05-21  7:58                                         ` Michael Wang
  1 sibling, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-05-21  7:21 UTC (permalink / raw)
  To: Michael Wang
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
> This is not enough to prove that policy->cpus is wrong, the cpu could
> be online when get from policy->cpus, but offline when checked here,
> since hotplug is able to happen during the period.

Strictly speaking you're correct but I don't do any hotplug besides the
one-time thing which is part of halting the box.

> I don't get it...
> 
> get_online_cpus() is just stop hotplug happen after it was invoked, so
> unless policy->cpus is really wrong, otherwise all the cpu it masked
> won't go offline any more.

Yes, that's my impression too - at the point we do gov_queue_work,
policy->cpus already contains offline cpus.

> This protect nothing...before we go here, the cpu could already
> offline, nothing changed...

Yes, but I don't want to schedule work on an offlined cpu and that is
ensured here.

> If you really want to confirm the policy->cpus was wrong, the way
> should be apply the fix I suggested, than check online in here.

Sure, feel free to get a box, enable NO_HZ_FULL and do all the
experimentations you desire. I surely cannot be the only one who
triggers this.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-21  7:21                                       ` Borislav Petkov
@ 2013-05-21  7:58                                         ` Michael Wang
  0 siblings, 0 replies; 54+ messages in thread
From: Michael Wang @ 2013-05-21  7:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Tejun Heo, Paul E. McKenney, Jiri Kosina,
	Frederic Weisbecker, Tony Luck, linux-kernel, x86,
	Thomas Gleixner, rjw, cpufreq, linux-pm

On 05/21/2013 03:21 PM, Borislav Petkov wrote:
> On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
>> This is not enough to prove that policy->cpus is wrong, the cpu could
>> be online when get from policy->cpus, but offline when checked here,
>> since hotplug is able to happen during the period.
> 
> Strictly speaking you're correct but I don't do any hotplug besides the
> one-time thing which is part of halting the box.

Well, they share the same cpu_down() I suppose...

> 
>> I don't get it...
>>
>> get_online_cpus() is just stop hotplug happen after it was invoked, so
>> unless policy->cpus is really wrong, otherwise all the cpu it masked
>> won't go offline any more.
> 
> Yes, that's my impression too - at the point we do gov_queue_work,
> policy->cpus already contains offline cpus.
> 
>> This protect nothing...before we go here, the cpu could already
>> offline, nothing changed...
> 
> Yes, but I don't want to schedule work on an offlined cpu and that is
> ensured here.

IMHO, the problem seems mostly like the wrong usage of policy->cpus,
it's providing the right info, but just at that time, we don't need
worry about work on offlined cpu if we don't allow cpu disappear.

Your approach could be good respect to performance, but if we could
prove that policy->cpus is correct firstly, than we could fix the
problem without any concern, don't we?

> 
>> If you really want to confirm the policy->cpus was wrong, the way
>> should be apply the fix I suggested, than check online in here.
> 
> Sure, feel free to get a box, enable NO_HZ_FULL and do all the
> experimentations you desire. I surely cannot be the only one who
> triggers this.

I'm fine if the problem get solved, that means your box doesn't show
WARN any more :)

Regards,
Michael Wang

> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-05-17 13:56           ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2 Borislav Petkov
  2013-05-20  3:16             ` Michael Wang
@ 2013-06-04 21:20             ` Jiri Kosina
  2013-06-05  2:30               ` Michael Wang
  1 sibling, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2013-06-04 21:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paul E. McKenney, Frederic Weisbecker, Tony Luck, linux-kernel,
	x86, Thomas Gleixner

On Fri, 17 May 2013, Borislav Petkov wrote:

> commit f7ea0fd639c2c48d3c61b6eec75362be290c6874
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Mon May 13 21:40:27 2013 +0200
> 
>     tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
> 
> Now, when I halt the box, I see these splats originating from cpufreq's
> od_dbs_timer adding a workqueue which does add_timer_on:
> 
> 
> [   49.338878] EXT4-fs (sda7): re-mounted. Opts: (null)
> [   51.502417] kvm: exiting hardware virtualization
> [   51.597330] ACPI: Preparing to enter system sleep state S5
> [   51.603147] Disabling non-boot CPUs ...
> [   51.616759] ------------[ cut here ]------------
> [   51.621460] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x58/0x60()
> [   51.629638] Modules linked in: ext2 vfat fat loop snd_hda_codec_hdmi usbhid snd_hda_codec_realtek coretemp kvm_intel kvm snd_hda_intel snd_hda_codec crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel sb_edac aes_x86_64 ehci_pci snd_page_alloc glue_helper snd_timer xhci_hcd snd iTCO_wdt iTCO_vendor_support ehci_hcd edac_core lpc_ich acpi_cpufreq lrw gf128mul ablk_helper cryptd mperf usbcore usb_common soundcore mfd_core dcdbas evdev pcspkr processor i2c_i801 button microcode
> [   51.675581] CPU: 0 PID: 244 Comm: kworker/1:1 Tainted: G        W    3.10.0-rc1+ #10
> [   51.683407] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> [   51.690901] Workqueue: events od_dbs_timer
> [   51.695069]  0000000000000009 ffff88043a2f5b68 ffffffff8161441c ffff88043a2f5ba8
> [   51.702602]  ffffffff8103e540 0000000000000033 0000000000000001 ffff88043d5f8000
> [   51.710136]  00000000ffff0ce1 0000000000000001 ffff88044fc4fc08 ffff88043a2f5bb8
> [   51.717691] Call Trace:
> [   51.720191]  [<ffffffff8161441c>] dump_stack+0x19/0x1b
> [   51.725396]  [<ffffffff8103e540>] warn_slowpath_common+0x70/0xa0
> [   51.731473]  [<ffffffff8103e58a>] warn_slowpath_null+0x1a/0x20
> [   51.737378]  [<ffffffff81025628>] native_smp_send_reschedule+0x58/0x60
> [   51.744013]  [<ffffffff81072cfd>] wake_up_nohz_cpu+0x2d/0xa0
> [   51.749745]  [<ffffffff8104f6bf>] add_timer_on+0x8f/0x110
> [   51.755214]  [<ffffffff8105f6fe>] __queue_delayed_work+0x16e/0x1a0
> [   51.761470]  [<ffffffff8105f251>] ? try_to_grab_pending+0xd1/0x1a0
> [   51.767724]  [<ffffffff8105f78a>] mod_delayed_work_on+0x5a/0xa0
> [   51.773719]  [<ffffffff814f6b5d>] gov_queue_work+0x4d/0xc0
> [   51.779271]  [<ffffffff814f60cb>] od_dbs_timer+0xcb/0x170
> [   51.784734]  [<ffffffff8105e75d>] process_one_work+0x1fd/0x540
> [   51.790634]  [<ffffffff8105e6f2>] ? process_one_work+0x192/0x540
> [   51.796711]  [<ffffffff8105ef22>] worker_thread+0x122/0x380
> [   51.802350]  [<ffffffff8105ee00>] ? rescuer_thread+0x320/0x320
> [   51.808264]  [<ffffffff8106634a>] kthread+0xea/0xf0
> [   51.813200]  [<ffffffff81066260>] ? flush_kthread_worker+0x150/0x150
> [   51.819644]  [<ffffffff81623d5c>] ret_from_fork+0x7c/0xb0

Just to not let this thread sleep -- I am seeing this as well, even with 
current Linus' tree (git HEAD == aa4f608).

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-06-04 21:20             ` Jiri Kosina
@ 2013-06-05  2:30               ` Michael Wang
  2013-06-05  8:08                 ` Jiri Kosina
  0 siblings, 1 reply; 54+ messages in thread
From: Michael Wang @ 2013-06-05  2:30 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, Paul E. McKenney, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner

Hi, Jiri

On 06/05/2013 05:20 AM, Jiri Kosina wrote:
[snip]
> 
> Just to not let this thread sleep -- I am seeing this as well, even with 
> current Linus' tree (git HEAD == aa4f608).

Have you tried this:

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 443442d..449be88 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -26,6 +26,7 @@
 #include <linux/tick.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
+#include <linux/cpu.h>

 #include "cpufreq_governor.h"

@@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy,
        if (!all_cpus) {
                __gov_queue_work(smp_processor_id(), dbs_data, delay);
        } else {
+               get_online_cpus();
                for_each_cpu(i, policy->cpus)
                        __gov_queue_work(i, dbs_data, delay);
+               put_online_cpus();
        }
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);

Does it works?

Regards,
Michael Wang

> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-06-05  2:30               ` Michael Wang
@ 2013-06-05  8:08                 ` Jiri Kosina
  2013-06-05  8:12                   ` Michael Wang
  0 siblings, 1 reply; 54+ messages in thread
From: Jiri Kosina @ 2013-06-05  8:08 UTC (permalink / raw)
  To: Michael Wang
  Cc: Borislav Petkov, Paul E. McKenney, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner

On Wed, 5 Jun 2013, Michael Wang wrote:

> > Just to not let this thread sleep -- I am seeing this as well, even with 
> > current Linus' tree (git HEAD == aa4f608).
> 
> Have you tried this:
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 443442d..449be88 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -26,6 +26,7 @@
>  #include <linux/tick.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> +#include <linux/cpu.h>
> 
>  #include "cpufreq_governor.h"
> 
> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
> struct cpufreq_policy *policy,
>         if (!all_cpus) {
>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>         } else {
> +               get_online_cpus();
>                 for_each_cpu(i, policy->cpus)
>                         __gov_queue_work(i, dbs_data, delay);
> +               put_online_cpus();
>         }
>  }
>  EXPORT_SYMBOL_GPL(gov_queue_work);
> 
> Does it works?

Yes, the warning is gone with this patch.

	Tested-by: Jiri Kosina <jkosina@suse.cz>

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2
  2013-06-05  8:08                 ` Jiri Kosina
@ 2013-06-05  8:12                   ` Michael Wang
  0 siblings, 0 replies; 54+ messages in thread
From: Michael Wang @ 2013-06-05  8:12 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, Paul E. McKenney, Frederic Weisbecker,
	Tony Luck, linux-kernel, x86, Thomas Gleixner

On 06/05/2013 04:08 PM, Jiri Kosina wrote:
> On Wed, 5 Jun 2013, Michael Wang wrote:
> 
>>> Just to not let this thread sleep -- I am seeing this as well, even with 
>>> current Linus' tree (git HEAD == aa4f608).
>>
>> Have you tried this:
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 443442d..449be88 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/tick.h>
>>  #include <linux/types.h>
>>  #include <linux/workqueue.h>
>> +#include <linux/cpu.h>
>>
>>  #include "cpufreq_governor.h"
>>
>> @@ -180,8 +181,10 @@ void gov_queue_work(struct dbs_data *dbs_data,
>> struct cpufreq_policy *policy,
>>         if (!all_cpus) {
>>                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
>>         } else {
>> +               get_online_cpus();
>>                 for_each_cpu(i, policy->cpus)
>>                         __gov_queue_work(i, dbs_data, delay);
>> +               put_online_cpus();
>>         }
>>  }
>>  EXPORT_SYMBOL_GPL(gov_queue_work);
>>
>> Does it works?
> 
> Yes, the warning is gone with this patch.

Nice, so I think we proved that 'policy->cpus' is well maintained.

> 
> 	Tested-by: Jiri Kosina <jkosina@suse.cz>

Thanks for your testing :)

Regards,
Michael Wang

> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2013-06-05  8:13 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-09 12:29 NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Jiri Kosina
2013-05-09 12:50 ` Borislav Petkov
2013-05-09 12:58   ` Borislav Petkov
2013-05-15 18:45     ` Paul E. McKenney
2013-05-15 22:43       ` Borislav Petkov
2013-05-15 23:55         ` Paul E. McKenney
2013-05-17 13:56           ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2 Borislav Petkov
2013-05-20  3:16             ` Michael Wang
2013-05-20  4:50               ` Borislav Petkov
2013-05-20  6:23                 ` Michael Wang
2013-05-20  6:47                   ` Borislav Petkov
2013-05-20  6:58                     ` Michael Wang
2013-05-20  7:06                       ` Michael Wang
2013-05-20  7:12                         ` Viresh Kumar
2013-05-20  7:25                           ` Michael Wang
2013-05-20  8:56                             ` Michael Wang
2013-05-20  9:09                               ` Viresh Kumar
2013-05-20  9:24                                 ` Michael Wang
2013-05-20 13:23                                   ` Borislav Petkov
2013-05-20 13:43                                     ` Viresh Kumar
2013-05-20 15:08                                       ` Borislav Petkov
2013-05-21  2:20                                     ` Michael Wang
2013-05-21  2:37                                       ` Michael Wang
2013-05-21  7:21                                       ` Borislav Petkov
2013-05-21  7:58                                         ` Michael Wang
2013-05-20  7:36                     ` Tejun Heo
2013-05-20  8:10                 ` Frederic Weisbecker
2013-05-20  9:31                   ` Srivatsa S. Bhat
2013-05-20  9:40                     ` Viresh Kumar
2013-05-20 10:24                       ` Viresh Kumar
2013-06-04 21:20             ` Jiri Kosina
2013-06-05  2:30               ` Michael Wang
2013-06-05  8:08                 ` Jiri Kosina
2013-06-05  8:12                   ` Michael Wang
2013-05-10  0:29 ` NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule Frederic Weisbecker
2013-05-10  9:28   ` Borislav Petkov
2013-05-10  9:26     ` Frederic Weisbecker
2013-05-10  9:37       ` Ingo Molnar
2013-05-10  9:45         ` Borislav Petkov
2013-05-10 15:31           ` Frederic Weisbecker
2013-05-10  9:43       ` Borislav Petkov
2013-05-10 15:42       ` Jiri Kosina
2013-05-10 15:03   ` Jiri Kosina
2013-05-10 15:21     ` Borislav Petkov
2013-05-10 15:43       ` Frederic Weisbecker
2013-05-10 16:23         ` Borislav Petkov
2013-05-10 21:38           ` Frederic Weisbecker
2013-05-13 14:56             ` Jiri Kosina
2013-05-13 19:40               ` Thomas Gleixner
2013-05-13 20:01                 ` Jiri Kosina
2013-05-14 15:46                 ` [tip:timers/urgent] tick: Don't invoke tick_nohz_stop_sched_tick( ) if the cpu is offline tip-bot for Thomas Gleixner
2013-05-15 19:41                   ` Frederic Weisbecker
2013-05-16 14:06                     ` Thomas Gleixner
2013-05-16 14:15                       ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).