* v3.0: Weird kernel log message when resuming avout NMI received @ 2011-07-31 6:56 Francis Moreau 2011-07-31 11:06 ` Cyrill Gorcunov 2011-08-02 19:52 ` Don Zickus 0 siblings, 2 replies; 15+ messages in thread From: Francis Moreau @ 2011-07-31 6:56 UTC (permalink / raw) To: Linux Kernel Mailing List Hello, I'm seeing those kernel message when resuming: [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. [ 524.973288] Do you have a strange power saving mode enabled? [ 524.973289] Dazed and confused, but trying to continue I don't know if it's important or not because the system seems to work after but maybe it worths to report -- Francis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau @ 2011-07-31 11:06 ` Cyrill Gorcunov 2011-07-31 11:19 ` Jiri Slaby 2011-07-31 15:20 ` Francis Moreau 2011-08-02 19:52 ` Don Zickus 1 sibling, 2 replies; 15+ messages in thread From: Cyrill Gorcunov @ 2011-07-31 11:06 UTC (permalink / raw) To: Francis Moreau; +Cc: Linux Kernel Mailing List On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > Hello, > > I'm seeing those kernel message when resuming: > > [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > [ 524.973288] Do you have a strange power saving mode enabled? > [ 524.973289] Dazed and confused, but trying to continue > > I don't know if it's important or not because the system seems to work > after but maybe it worths to report > Hi Francis, cpu please? Cyrill ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 11:06 ` Cyrill Gorcunov @ 2011-07-31 11:19 ` Jiri Slaby 2011-07-31 11:26 ` Cyrill Gorcunov 2011-08-02 19:48 ` Don Zickus 2011-07-31 15:20 ` Francis Moreau 1 sibling, 2 replies; 15+ messages in thread From: Jiri Slaby @ 2011-07-31 11:19 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Francis Moreau, Linux Kernel Mailing List On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote: > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: >> Hello, >> >> I'm seeing those kernel message when resuming: >> >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. >> [ 524.973288] Do you have a strange power saving mode enabled? >> [ 524.973289] Dazed and confused, but trying to continue >> >> I don't know if it's important or not because the system seems to work >> after but maybe it worths to report >> > > Hi Francis, cpu please? Hi, I'm seeing those too. For a longer time though. Please see https://bugzilla.novell.com/show_bug.cgi?id=678882#c16 I suspect NMI watchdog rewrite caused this. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz stepping : 11 cpu MHz : 2993.060 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority bogomips : 5986.12 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz stepping : 11 cpu MHz : 2993.060 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority bogomips : 5984.98 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: thanks, -- js ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 11:19 ` Jiri Slaby @ 2011-07-31 11:26 ` Cyrill Gorcunov 2011-08-02 19:48 ` Don Zickus 1 sibling, 0 replies; 15+ messages in thread From: Cyrill Gorcunov @ 2011-07-31 11:26 UTC (permalink / raw) To: Jiri Slaby Cc: Francis Moreau, Linux Kernel Mailing List, Don Zickus, Ingo Molnar, Stephane Eranian On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote: > On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote: > > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > >> Hello, > >> > >> I'm seeing those kernel message when resuming: > >> > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > >> [ 524.973288] Do you have a strange power saving mode enabled? > >> [ 524.973289] Dazed and confused, but trying to continue > >> > >> I don't know if it's important or not because the system seems to work > >> after but maybe it worths to report > >> > > > > Hi Francis, cpu please? > > Hi, I'm seeing those too. For a longer time though. Please see > https://bugzilla.novell.com/show_bug.cgi?id=678882#c16 > > I suspect NMI watchdog rewrite caused this. > Crap. I thought we've resolved it (not exactly this one but still unknown nmi issue) :) Still I don't remember if we paid much attention on suspend/resume time. Don? Cyrill ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 11:19 ` Jiri Slaby 2011-07-31 11:26 ` Cyrill Gorcunov @ 2011-08-02 19:48 ` Don Zickus 2011-08-02 21:00 ` Jiri Slaby 1 sibling, 1 reply; 15+ messages in thread From: Don Zickus @ 2011-08-02 19:48 UTC (permalink / raw) To: Jiri Slaby; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote: > On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote: > > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > >> Hello, > >> > >> I'm seeing those kernel message when resuming: > >> > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > >> [ 524.973288] Do you have a strange power saving mode enabled? > >> [ 524.973289] Dazed and confused, but trying to continue > >> > >> I don't know if it's important or not because the system seems to work > >> after but maybe it worths to report > >> > > > > Hi Francis, cpu please? > > Hi, I'm seeing those too. For a longer time though. Please see > https://bugzilla.novell.com/show_bug.cgi?id=678882#c16 That is with 2.6.37. Lots of watchdog/perf nmi fixes have been added since then (though dmesg suggests the watchdog is disabled due to a broken PMU). I think for the Core iX family, 2.6.39 and higher have been stable. Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-02 19:48 ` Don Zickus @ 2011-08-02 21:00 ` Jiri Slaby 2011-08-03 0:52 ` Donald Zickus II 0 siblings, 1 reply; 15+ messages in thread From: Jiri Slaby @ 2011-08-02 21:00 UTC (permalink / raw) To: Don Zickus; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List On 08/02/2011 09:48 PM, Don Zickus wrote: > On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote: >> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote: >>> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: >>>> Hello, >>>> >>>> I'm seeing those kernel message when resuming: >>>> >>>> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. >>>> [ 524.973288] Do you have a strange power saving mode enabled? >>>> [ 524.973289] Dazed and confused, but trying to continue >>>> >>>> I don't know if it's important or not because the system seems to work >>>> after but maybe it worths to report >>>> >>> >>> Hi Francis, cpu please? >> >> Hi, I'm seeing those too. For a longer time though. Please see >> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16 > > That is with 2.6.37. Lots of watchdog/perf nmi fixes have been added > since then (though dmesg suggests the watchdog is disabled due to a broken > PMU). No, comment 16 (the link above) is about 3.0. > I think for the Core iX family, 2.6.39 and higher have been stable. It looks like not quite... regards, -- js ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-02 21:00 ` Jiri Slaby @ 2011-08-03 0:52 ` Donald Zickus II 0 siblings, 0 replies; 15+ messages in thread From: Donald Zickus II @ 2011-08-03 0:52 UTC (permalink / raw) To: Jiri Slaby; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List ----- Original Message ----- > On 08/02/2011 09:48 PM, Don Zickus wrote: > > On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote: > >> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote: > >>> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > >>>> Hello, > >>>> > >>>> I'm seeing those kernel message when resuming: > >>>> > >>>> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > >>>> [ 524.973288] Do you have a strange power saving mode enabled? > >>>> [ 524.973289] Dazed and confused, but trying to continue > >>>> > >>>> I don't know if it's important or not because the system seems to > >>>> work > >>>> after but maybe it worths to report > >>>> > >>> > >>> Hi Francis, cpu please? > >> > >> Hi, I'm seeing those too. For a longer time though. Please see > >> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16 > > > > That is with 2.6.37. Lots of watchdog/perf nmi fixes have been added > > since then (though dmesg suggests the watchdog is disabled due to a > > broken > > PMU). > > No, comment 16 (the link above) is about 3.0. Hi Jiri, My apologies. I see a bugzilla and just scroll to the top, habit I guess. It seems like Peter has a fix already. Hopefully it works. Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 11:06 ` Cyrill Gorcunov 2011-07-31 11:19 ` Jiri Slaby @ 2011-07-31 15:20 ` Francis Moreau 2011-07-31 15:32 ` Cyrill Gorcunov 1 sibling, 1 reply; 15+ messages in thread From: Francis Moreau @ 2011-07-31 15:20 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Linux Kernel Mailing List Hi Cyrill On Sun, Jul 31, 2011 at 1:06 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: >> Hello, >> >> I'm seeing those kernel message when resuming: >> >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. >> [ 524.973288] Do you have a strange power saving mode enabled? >> [ 524.973289] Dazed and confused, but trying to continue >> >> I don't know if it's important or not because the system seems to work >> after but maybe it worths to report >> > > Hi Francis, cpu please? > Still the same which causes some kvm-tools madness ;) $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz stepping : 5 cpu MHz : 1197.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5053.42 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz stepping : 5 cpu MHz : 1197.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5053.72 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz stepping : 5 cpu MHz : 1197.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5053.70 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz stepping : 5 cpu MHz : 1197.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5053.74 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: -- Francis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 15:20 ` Francis Moreau @ 2011-07-31 15:32 ` Cyrill Gorcunov 2011-08-01 11:05 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2011-07-31 15:32 UTC (permalink / raw) To: Francis Moreau Cc: LKML, Don Zickus, Peter Zijlstra, Stephane Eranian, Ingo Molnar On Sun, Jul 31, 2011 at 05:20:15PM +0200, Francis Moreau wrote: > Hi Cyrill > > On Sun, Jul 31, 2011 at 1:06 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > >> Hello, > >> > >> I'm seeing those kernel message when resuming: > >> > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > >> [ 524.973288] Do you have a strange power saving mode enabled? > >> [ 524.973289] Dazed and confused, but trying to continue > >> > >> I don't know if it's important or not because the system seems to work > >> after but maybe it worths to report > >> > > > > Hi Francis, cpu please? > > > > Still the same which causes some kvm-tools madness ;) > OK, thanks Francis, I've CC'ed a couple of people involved into x86 perf implementation. I believe at moment of wakeup we could drop any/all overflow flags in perf registers (still didn't look into, will try to find some time a bit later). Cyrill ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 15:32 ` Cyrill Gorcunov @ 2011-08-01 11:05 ` Peter Zijlstra 2011-08-01 11:21 ` Cyrill Gorcunov 2011-08-15 8:37 ` Francis Moreau 0 siblings, 2 replies; 15+ messages in thread From: Peter Zijlstra @ 2011-08-01 11:05 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote: > > >> I'm seeing those kernel message when resuming: > > >> > > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > > >> [ 524.973288] Do you have a strange power saving mode enabled? > > >> [ 524.973289] Dazed and confused, but trying to continue > > >> > > >> I don't know if it's important or not because the system seems to work > > >> after but maybe it worths to report So I guess the problem is the NMI watchdog and suspend stuff not shutting things down properly.. Argh, the PM notifier muck runs before the hotplug notifiers and it doesn't avoid hotplug races on its own.. what crap. something like the below perhaps, compile tested only.. does it work? --- Subject: perf: Add PM notifiers From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Mon Aug 01 12:49:14 CEST 2011 Francis reports that s2r gets him spurious NMIs, this is because the suspend code leaves the boot cpu up and running. Cure this by adding a suspend notifier. The problem is that hotplug and suspend are completely un-serialized and the PM notifiers run before the suspend cpu unplug of all but the boot cpu. This leaves a window where the user can initialize another hotplug operation (either remove or add a cpu) resulting in either one too many or one too few hotplug ops. Thus we cannot use the hotplug code for the suspend case. There's another reason to not use the hotplug code, which is that the hotplug code totally destroys the perf state, we can do better for suspend and simply remove all counters from the PMU so that we can re-instate them on resume. Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- kernel/events/core.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 95 insertions(+), 2 deletions(-) Index: linux-2.6/kernel/events/core.c =================================================================== --- linux-2.6.orig/kernel/events/core.c +++ linux-2.6/kernel/events/core.c @@ -29,6 +29,7 @@ #include <linux/hardirq.h> #include <linux/rculist.h> #include <linux/uaccess.h> +#include <linux/suspend.h> #include <linux/syscalls.h> #include <linux/anon_inodes.h> #include <linux/kernel_stat.h> @@ -6809,7 +6810,7 @@ static void __cpuinit perf_event_init_cp struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu); mutex_lock(&swhash->hlist_mutex); - if (swhash->hlist_refcount > 0) { + if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) { struct swevent_hlist *hlist; hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu)); @@ -6898,7 +6899,14 @@ perf_cpu_notify(struct notifier_block *s { unsigned int cpu = (long)hcpu; - switch (action & ~CPU_TASKS_FROZEN) { + /* + * Ignore suspend/resume action, the perf_pm_notifier will + * take care of that. + */ + if (action & CPU_TASKS_FROZEN) + return NOTIFY_OK; + + switch (action) { case CPU_UP_PREPARE: case CPU_DOWN_FAILED: @@ -6917,6 +6925,90 @@ perf_cpu_notify(struct notifier_block *s return NOTIFY_OK; } +static void perf_pm_resume_cpu(void *unused) +{ + struct perf_cpu_context *cpuctx; + struct perf_event_context *ctx; + struct pmu *pmu; + int idx; + + idx = srcu_read_lock(&pmus_srcu); + list_for_each_entry_rcu(pmu, &pmus, entry) { + cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + ctx = cpuctx->task_ctx; + + perf_ctx_lock(cpuctx, ctx); + perf_pmu_disable(cpuctx->ctx.pmu); + + cpu_ctx_sched_out(cpuctx, EVENT_ALL); + if (ctx) + ctx_sched_out(ctx, cpuctx, EVENT_ALL); + + perf_pmu_enable(cpuctx->ctx.pmu); + perf_ctx_unlock(cpuctx, ctx); + } + srcu_read_unlock(&pmus_srcu, idx); +} + +static void perf_pm_suspend_cpu(void *unused) +{ + struct perf_cpu_context *cpuctx; + struct perf_event_context *ctx; + struct pmu *pmu; + int idx; + + idx = srcu_read_lock(&pmus_srcu); + list_for_each_entry_rcu(pmu, &pmus, entry) { + cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + ctx = cpuctx->task_ctx; + + perf_ctx_lock(cpuctx, ctx); + perf_pmu_disable(cpuctx->ctx.pmu); + + perf_event_sched_in(cpuctx, ctx, current); + + perf_pmu_enable(cpuctx->ctx.pmu); + perf_ctx_unlock(cpuctx, ctx); + } + srcu_read_unlock(&pmus_srcu, idx); +} + +static int perf_resume(void) +{ + get_online_cpus(); + smp_call_function(perf_pm_resume_cpu, NULL, 1); + put_online_cpus(); + + return NOTIFY_OK; +} + +static int perf_suspend(void) +{ + get_online_cpus(); + smp_call_function(perf_pm_suspend_cpu, NULL, 1); + put_online_cpus(); + + return NOTIFY_OK; +} + +static int perf_pm(struct notifier_block *self, unsigned long action, void *ptr) +{ + switch (action) { + case PM_POST_HIBERNATION: + case PM_POST_SUSPEND: + return perf_resume(); + case PM_HIBERNATION_PREPARE: + case PM_SUSPEND_PREPARE: + return perf_suspend(); + default: + return NOTIFY_DONE; + } +} + +static struct notifier_block perf_pm_notifier = { + .notifier_call = perf_pm, +}; + void __init perf_event_init(void) { int ret; @@ -6931,6 +7023,7 @@ void __init perf_event_init(void) perf_tp_register(); perf_cpu_notifier(perf_cpu_notify); register_reboot_notifier(&perf_reboot_notifier); + register_pm_notifier(&perf_pm_notifier); ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-01 11:05 ` Peter Zijlstra @ 2011-08-01 11:21 ` Cyrill Gorcunov 2011-08-01 12:12 ` Peter Zijlstra 2011-08-15 8:37 ` Francis Moreau 1 sibling, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2011-08-01 11:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar, Jiri Slaby On Mon, Aug 01, 2011 at 01:05:02PM +0200, Peter Zijlstra wrote: > On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote: > > > > >> I'm seeing those kernel message when resuming: > > > >> > > > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > > > >> [ 524.973288] Do you have a strange power saving mode enabled? > > > >> [ 524.973289] Dazed and confused, but trying to continue > > > >> > > > >> I don't know if it's important or not because the system seems to work > > > >> after but maybe it worths to report > > So I guess the problem is the NMI watchdog and suspend stuff not > shutting things down properly.. > > Argh, the PM notifier muck runs before the hotplug notifiers and it > doesn't avoid hotplug races on its own.. what crap. > > something like the below perhaps, compile tested only.. does it work? > Thanks a huge, Peter! (I'm CC'ing Jiri as well, he has same issue) ... > @@ -6809,7 +6810,7 @@ static void __cpuinit perf_event_init_cp > struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu); > > mutex_lock(&swhash->hlist_mutex); > - if (swhash->hlist_refcount > 0) { > + if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) { Should not there be rcu_dereference(swhash->swevent_hlist)? > struct swevent_hlist *hlist; > > hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu)); ... Cyrill ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-01 11:21 ` Cyrill Gorcunov @ 2011-08-01 12:12 ` Peter Zijlstra 0 siblings, 0 replies; 15+ messages in thread From: Peter Zijlstra @ 2011-08-01 12:12 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar, Jiri Slaby On Mon, 2011-08-01 at 15:21 +0400, Cyrill Gorcunov wrote: > > mutex_lock(&swhash->hlist_mutex); > > - if (swhash->hlist_refcount > 0) { > > + if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) { > > Should not there be rcu_dereference(swhash->swevent_hlist)? swhash->hlist_mutex is the modifier lock for that rcu use and should thus serialize things sufficiently. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-01 11:05 ` Peter Zijlstra 2011-08-01 11:21 ` Cyrill Gorcunov @ 2011-08-15 8:37 ` Francis Moreau 2011-08-16 15:07 ` Francis Moreau 1 sibling, 1 reply; 15+ messages in thread From: Francis Moreau @ 2011-08-15 8:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Cyrill Gorcunov, LKML, Don Zickus, Stephane Eranian, Ingo Molnar Hello Peter, Sorry for the loooonnng delay but summer rest :) On Mon, Aug 1, 2011 at 1:05 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote: > >> > >> I'm seeing those kernel message when resuming: >> > >> >> > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. >> > >> [ 524.973288] Do you have a strange power saving mode enabled? >> > >> [ 524.973289] Dazed and confused, but trying to continue >> > >> >> > >> I don't know if it's important or not because the system seems to work >> > >> after but maybe it worths to report > > So I guess the problem is the NMI watchdog and suspend stuff not > shutting things down properly.. > > Argh, the PM notifier muck runs before the hotplug notifiers and it > doesn't avoid hotplug races on its own.. what crap. > > something like the below perhaps, compile tested only.. does it work? Thanks for the fix, I'm going to give it a test and report in 2 days. -- Francis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-08-15 8:37 ` Francis Moreau @ 2011-08-16 15:07 ` Francis Moreau 0 siblings, 0 replies; 15+ messages in thread From: Francis Moreau @ 2011-08-16 15:07 UTC (permalink / raw) To: Peter Zijlstra Cc: Cyrill Gorcunov, LKML, Don Zickus, Stephane Eranian, Ingo Molnar Peter, On Mon, Aug 15, 2011 at 10:37 AM, Francis Moreau <francis.moro@gmail.com> wrote: > Hello Peter, > > Sorry for the loooonnng delay but summer rest :) > > On Mon, Aug 1, 2011 at 1:05 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: >> On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote: >> >>> > >> I'm seeing those kernel message when resuming: >>> > >> >>> > >> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. >>> > >> [ 524.973288] Do you have a strange power saving mode enabled? >>> > >> [ 524.973289] Dazed and confused, but trying to continue >>> > >> >>> > >> I don't know if it's important or not because the system seems to work >>> > >> after but maybe it worths to report >> >> So I guess the problem is the NMI watchdog and suspend stuff not >> shutting things down properly.. >> >> Argh, the PM notifier muck runs before the hotplug notifiers and it >> doesn't avoid hotplug races on its own.. what crap. >> >> something like the below perhaps, compile tested only.. does it work? > > Thanks for the fix, I'm going to give it a test and report in 2 days. > I'm trying to test the fix but have to stick with the 3.0 kernel and your patch seems to be based on 3.1. So I had to backport a couple of patches otherwise I'm getting the following error: CC kernel/events/core.o kernel/events/core.c: In function ‘perf_pm_resume_cpu’: kernel/events/core.c:7342: error: implicit declaration of function ‘perf_ctx_lock’ kernel/events/core.c:7350: error: implicit declaration of function ‘perf_ctx_unlock’ kernel/events/core.c: In function ‘perf_pm_suspend_cpu’: kernel/events/core.c:7370: error: implicit declaration of function ‘perf_event_sched_in’ Here are the patches that I backported on top of 3.0: perf: Collect the schedule-in rules in one function perf: Change and simplify ctx::is_active semantics perf: Simplify and fix __perf_install_in_context() perf: Remove task_ctx_sched_in() perf: Optimize event scheduling locking perf: Clean up 'ctx' reference counting perf: Optimize ctx_sched_out() After the second suspend/resume sequence I still have the same warning except that the reason has changed to '2d': kernel:[ 404.091393] Uhhuh. NMI received for unknown reason 2d on CPU 0. kernel:[ 404.091394] Do you have a strange power saving mode enabled? kernel:[ 404.091396] Dazed and confused, but trying to continue Thanks -- Francis ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: v3.0: Weird kernel log message when resuming avout NMI received 2011-07-31 6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau 2011-07-31 11:06 ` Cyrill Gorcunov @ 2011-08-02 19:52 ` Don Zickus 1 sibling, 0 replies; 15+ messages in thread From: Don Zickus @ 2011-08-02 19:52 UTC (permalink / raw) To: Francis Moreau; +Cc: Linux Kernel Mailing List On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote: > Hello, > > I'm seeing those kernel message when resuming: > > [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0. > [ 524.973288] Do you have a strange power saving mode enabled? > [ 524.973289] Dazed and confused, but trying to continue > > I don't know if it's important or not because the system seems to work > after but maybe it worths to report Depends on what you were doing when this happened and if it is repeatable? If it is from perf/nmi_watchdog it probably isn't that big of a deal (though we should fix it). Otherwise it could be a hardware failure. Can you attach a dmesg log? Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-16 15:07 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-31 6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau 2011-07-31 11:06 ` Cyrill Gorcunov 2011-07-31 11:19 ` Jiri Slaby 2011-07-31 11:26 ` Cyrill Gorcunov 2011-08-02 19:48 ` Don Zickus 2011-08-02 21:00 ` Jiri Slaby 2011-08-03 0:52 ` Donald Zickus II 2011-07-31 15:20 ` Francis Moreau 2011-07-31 15:32 ` Cyrill Gorcunov 2011-08-01 11:05 ` Peter Zijlstra 2011-08-01 11:21 ` Cyrill Gorcunov 2011-08-01 12:12 ` Peter Zijlstra 2011-08-15 8:37 ` Francis Moreau 2011-08-16 15:07 ` Francis Moreau 2011-08-02 19:52 ` Don Zickus
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.